Program for Thursday, November 23rd

08:30-10:00 Session ThuB1: Computationally Intelligent Techniques in Processing and Analysis of Neuronal Information (Invited Session)

08:30	Hao Li, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang and Zeng-Guang Hou Effective skill learning on vascular robotic systems: Combining offline and online reinforcement learning ABSTRACT. Vascular robotic systems, which have gained popularity in clinic, provide a platform for potentially semi-automated surgery. Reinforcement learning (RL) is a appealing skill-learning method to facilitate automatic instrument delivery. However, the notorious sample inefficiency of RL has limited its application in this domain. To address this issue, this paper proposes a novel RL framework, Distributed Reinforcement learning with Adaptive Conservatism (DRAC), that learns manipulation skills with a modest amount of interactions. DRAC pretrains skills from rule-based interactions before online fine-tuning to utilize prior knowledge and improve sample efficiency. Moreover, DRAC uses adaptive conservatism to explore safely during online fine-tuning and a distributed structure to shorten training time. Experiments in a pre-clinical environment demonstrate that DRAC can deliver guidewire to the target with less dangerous exploration and better performance than prior methods (success rate of 96.00% and mean backward steps of 9.54) within 20k interactions. These results indicate that the proposed algorithm is promising to learn skills for vascular robotic systems.
08:45	Yan Wu, Shixian Luo and Yan Jiang Impulsive Accelerated Reinforcement Learning for H∞ Control ABSTRACT. This paper revisits reinforcement learning for $H_\infty$ control of affine nonlinear systems with partially unknown dynamics. By incorporating an impulsive momentum-based control into the conventional critic neural network, an impulsive accelerated reinforcement learning algorithm, introducing an accelerated gradient flow with a restart mechanism, is proposed to improve the convergence speed and transient performance compared to traditional gradient descent-based techniques or continuously accelerated gradient methods. Moreover, by utilizing the quasi-periodic Lyapunov function method, an asymptotic stability criterion of the closed-loop system is established. A numerical example with comparisons is provided to illustrate the theoretical results.
09:00	Chao Zhang Block-Matching Multi-Pedestrian Tracking ABSTRACT. Target association is an extremely important problem in the field of multi-object tracking, especially for pedestrian scenes with high similarity in appearance and dense distribution. The traditional approach of combining IOU and ReID techniques with the Hungarian algorithm only partially addresses these challenges. To improve the model's association matching ability, this paper proposes a block matching model that extracts local features using a Block Matching Module (BMM) based on the Transformer model. The BMM extracts features by dividing them into blocks and mines effective features of the target to complete target similarity evaluation. Additionally, a Euclidean Distance Module (EDM) based on the Euclidean distance association matching strategy is introduced to further enhance the model's association ability. By integrating BMM and EDM into the same multi-object tracking model, this paper establishes a novel model called BWTrack that achieves excellent performance on MOT16, MOT17, and MOT20 while maintaining high performance at 7 FPS on a single GPU.
09:15	Xue Yi Liu, Qi Chao Zhang, Yin Feng Gao and Zhongpu Xia PnP: Integrated Prediction and Planning for Interactive Lane Change in Dense Traffic ABSTRACT. Making human-like behaviors for autonomous driving in interactive scenarios is critical and challenging, which requires the self-driving vehicle to reason about interactive vehicles' reactions to its behavior. We propose an integrated prediction and planning (PnP) decision-making method to address this task. To consider the interactive behaviors, a reactive trajectory prediction model is designed to predict the future states of other actors. Then, n-step temporal-difference search combining the value estimation network with the reactive prediction model is used to make a tactical decision and plan the tracking trajectory for the self-driving vehicle. The proposed PnP method is evaluated in the CARLA simulator and the results verify that PnP achieves better performances than popular model-free and model-based reinforcement learning baselines.
09:30	Yuyang Wang, Jean-Rémy Chardonnet and Frédéric Merienne Modeling online adaptive navigation in virtual environments based on PID control ABSTRACT. It is well known that locomotion-dominated navigation tasks may highly provoke cybersickness effects. Past research has proposed numerous approaches to tackle this issue based on offline considerations. In this work, a novel approach to mitigate cybersickness is presented based on online adaptive navigation. Considering the Proportional-Integral-Derivative (PID) control method, we proposed a mathematical model for online adaptive navigation parameterized with several parameters, taking as input the users' electro-dermal activity (EDA), an efficient indicator to measure the cybersickness level, and providing as output adapted navigation accelerations. Therefore, minimizing the cybersickness level is regarded as an argument optimization problem: find the PID model parameters which can reduce the severity of cybersickness. User studies were organized to collect non-adapted navigation accelerations and the corresponding EDA signals. A deep neural network was then formulated to learn the correlation between EDA and navigation accelerations. The hyperparameters of the network were obtained through the Optuna open-source framework. To validate the performance of the optimized online adaptive navigation developed through the PID control, we performed an analysis in a simulated user study based on the pre-trained deep neural network. Results indicate a significant reduction of cybersickness in terms of EDA signal analysis and motion sickness dose value. This is a pioneering work which presented a systematic strategy for adaptive navigation settings from a theoretical point.
09:45	Bin Shu, Chuandong Li, Yawei Shi, Huiwei Wang and Huaqing Li A 3D UWB hybrid localization method based on BSR and L-AOA ABSTRACT. In this paper, the reliability of the base station(BSR) and the low-cost Angle of arrival(L-AOA) positioning method are proposed to optimize the positioning result of the time difference of arrival(TDOA) positioning method, and then the result is substituted into the Taylor algorithm as the initial value for iterative optimization. The experimental results in non-line-of-sight environment show that the proposed method improves the localization accuracy by about 20% compared with that of TDOA only. In addition, we also apply this algorithm to the mixed algorithms of TDOA and Taylor, under with the same error environment, the positioning accuracy is improved by about 10%.

Chair:

Mufti Mahmud

Location: Hefei Ting (合肥厅)

08:30	Jiazhi Li, Degui Xiao, Tao Lu and Shiping Dong MEFaceNets: Muti-scale Efficient CNNs for Real-time Face Recognition on Embedded Devices ABSTRACT. The trend of face recognition being widely used on terminals and embedded devices makes the trade-off between recognition accuracy and actual delay critical. To address this challenge, we propose an efficient bottleneck named MEBottleneck, which utilizes convolution kernels of different sizes on two branches to capture multi-scale features in the bottleneck, followed by a $1 \times 1$ expansion layer to fuse multi-scale features, thereby improving the representation ability. Then, to balance the trade-off between accuracy and latency, we design a family of lightweight models with MEBottleneck specifically for face recognition, named MEFaceNets. Large kernels are used for depthwise convolutions in shallow layers, resulting in improved accuracy. We evaluate the proposed models on several popular face recognition benchmarks. Our primary model achieves 99.80% face verification accuracy on LFW and exhibits excellent performance on the larger and more challenging benchmarks, including MegaFace Challenge 1, IJB-B and IJB-C. Meanwhile, the latency of our primary model is 90 ms on RK3399, which is sufficient to satisfy real-time recognition on the resource-constrained embedded device.
08:45	Dasaradharami Reddy K, Gautam Srivastava, Yaodong Zhu, Supriya Y, Gokul Yenduri, Nancy Victor, Anusha S and Thippa Reddy Gadekallu Federated learning using the Particle Swarm Optimization model for the early detection of COVID-19 ABSTRACT. The COVID-19 pandemic has created significant global health and socioeconomic challenges, which urges the need for efficient and effective early detection methods. Several traditional machine learning(ML) and deep learning(DL) approaches have been used in the detection of COVID-19. But ML and DL strategies face challenges like transmission delays, a lack of computing power, communication delays, and privacy concerns. Federated Learning (FL), has emerged as a promising method for training the models on decentralized data while ensuring privacy. In this paper, we present a novel FL framework for early detection of COVID-19 using particle swarm optimization (PSO) model. The proposed framework uses the advantages of both FL and PSO. By employing the PSO technique the model aims to achieve faster convergence and improved performance. In order to validate the effectiveness of the proposed approach, we performed the experiments using the COVID-19 image dataset which was collected from different healthcare institutions. The results indicate that the our approach is more effective as it achieves higher accuracy rate of 94.36\% and which is higher when compared to traditional centralized learning approaches. Furthermore, the FL framework ensures data privacy and security by keeping sensitive patient information decentralized and only sharing aggregated model updates during the training process.
09:00	Victor Parque and Miyashita Tomoyuki On Searching for Minimal Integer Representation of Undirected Graphs ABSTRACT. The succinct representation of graphs is relevant to store, communicate, and sample the space of unstructured graphs meeting user-defined criteria. In this paper, we investigate the performance of eight classes of gradient-free optimization heuristics based on Differential Evolution to search for minimal integer representations of undirected graphs. Our computational experiments using graph instances with varying degrees of sparsity have shown the merit of exploration strategies to attain better convergence with few function evaluations. Our results have the potential to elucidate new number-based approaches for graph representation, design and optimization.
09:15	Jiajie Luo, Lin Xiao, Ping Tan and Jichun Li Anti-Interference Zeroing Neural Network Model for Time-Varying Tensor Square Root Finding ABSTRACT. Square root finding plays an important role in many scientific and engineering fields, such as optimization, signal processing and state estimation, but existing research mainly focuses on solving the timeinvariant matrix square root problem. So far, few researchers have studied the the time-varying tensor square root (TVTSR) problem. In this study, a novel anti-interference zeroing neural network (AIZNN) model is proposed to solve TVTSR problem online. With the activation of the advanced power activation function (APAF), the AIZNN model is robust in solving the TVTSR problem in the presence of the vanishing and non-vanishing disturbances. We present detailed theoretical analysis to show that, with the AIZNN model, the trajectory of error will converge to zero within a fixed time, and we also calculate the upper bound of the convergence time. Numerical experiments are presented to further verify the robustness of the proposed AIZNN model. Both the theoretical analysis and numerical experiments show that, the proposed AIZNN model provides a novel and noise-tolerant way to solve the TVTSR problem online.
09:30	Chuanhao Dong, Fuyong Xu, Yuanying Wang, Peiyu Liu and Liancheng Xu CACL:Commonsense-Aware Contrastive Learning for Knowledge Graph Completion ABSTRACT. Most knowledge graphs (KGs) are incomplete in the real world, so knowledge graph completion (KGC) is widely investigated to predict the most credible missing facts from given knowledge. However, existing KGC methods heavily rely on the given facts to predict missing relations between entities, ignoring the value of external knowledge. In addition, previous knowledge representation methods ignore the multi-perspective characteristics of cognate knowledge, which leds to the inability to obtain high-level semantic representation of knowledge. To alleviate the above issues, this paper proposes a Commonsense Aware Contrastive Learning (CACL) framework, which extracts relevant knowledge triples from existing commonsense knowledge base to assist in the KGC. Moreover, our method employs knowledge contrast representation learning method to acquire the higher-order representation from multiple perspectives. Experiments show that our method improves the performance of basic knowledge graph embedding (KGE) models. Our method also could be easily adapted to various KGE models.
09:45	Aiyue Gong, Fumiyo Fukumoto, Panitan Muangkammuen, Jiyi Li and Dongjin Yu Identifying Self-Admitted Technical Debt with Context-based Ladder Network ABSTRACT. Technical debt is inevitable in software development. The accumulation of technical debt will make the software fixes prohibitively expensive. Self-admitted technical debt (SATD) is a type of technical debt. Identifying SATD in code comments can improve code quality. However, manually discerning whether code comments contain SATD would be expensive and time-consuming. To solve this problem, we propose a method to apply the Ladder Network with the pre-training model to identify SATD based on the labeled data from 10 open source projects and the unlabeled data from another ten projects. By comparing with the original model of Ladder Network, and other semi-supervised learning models, the results show that the proposed method performs better in technical debt identification. In addition, the proposed method also achieves better results compared with supervised learning methods. This shows that our approach can make better use of unlabeled data to improve classification performance.

08:30-10:00 Session ThuC1: Advanced Computational Intelligence for Sustainable Transportation (Invited Session)

Chair:

Ruobin Gao

08:30-10:00 Session ThuD1: Data Mining 2

08:30	Xinyu Li, Yi Zuo, Tieshan Li and C. L. Philip Chen A Novel Machine Learning Model using CNN-LSTM Parallel Networks for Predicting Ship Fuel Consumption ABSTRACT. With the continuous increase of carbon emission, precise prediction of ship fuel consumption is gaining significance in reduction of energy consumption and emissions for ships. However, existing approaches for estimation of fuel consumption still have significant room for expansion in terms of high effi-ciency and accuracy. Furthermore, previous studies have not focused on capturing properties of both short and long term and traits for multi-sensor data. Considering the above issues, a novel machine learning model with CNN-LSTM parallel networks is proposed by combining convolutional neu-ral network, long short-term memory and artificial neural network models in this paper. The proposed model integrates the advantages of three single models, such as based on the comprehensive consideration of temporal and non-linear properties of data on fuel consumption through convolutional neural network and long short-term memory, utilizing artificial neural net-works and mechanism of parallel learning to achieve multi-source data fu-sion. Moreover, based on the multi-source data of the liquefied petroleum gas carrier, the proposed model is proven to be effective. Experimental out-comes suggest that CNN-LSTM parallel networks is the best choice with RMSE reaching 0.0243, which decreased by 5.81%, 58.25% and 37.85% than convolutional neural network, long short-term memory and artificial neural network. Therefore, the proposed model can significantly enhance energy efficiency of the ship and reduce operating expenses and emissions.
08:45	Lingxuan Weng, Maohan Liang, Ruobin Gao and Zhong Shuo Chen Deep Learning-Empowered Unsupervised Maritime Anomaly Detection ABSTRACT. Automatically detecting anomalous vessel behaviour is an extremely crucial problem in intelligent maritime surveillance. In this paper, a deep learning-based unsupervised method is proposed for detecting anomalies in vessel trajectories, operating at both the image and pixel levels. The original trajectory data is converted into a two-dimensional matrix representation to generate a vessel trajectory image. A wasserstein generative adversarial network (WGAN) model is trained on a dataset of normal vessel trajectories, while simultaneously training an encoder to map the trajectory image to a latent space. During anomaly detection, the vessel trajectory image is mapped to a hidden vector by the encoder, which is then used by the generator to reconstruct the input image. The anomaly score is computed based on the residuals between the reconstructed trajectory image and the discriminator's residuals, enabling image-level anomaly detection. Furthermore, pixel-level anomaly detection is achieved by analyzing the residuals of the reconstructed image pixels to localize the anomalous trajectory. The proposed method is compared to autoencoder (AE) and variational autoencoder (VAE) techniques, and experimental results demonstrate its superior performance in anomaly detection and pixel-level localization. This method has substantial potential for detecting anomalies in vessel trajectories, as it can detect anomalies in arbitrary waters without prior knowledge, relying solely on training with normal vessel trajectories. This approach significantly reduces the need for human and material resources. Moreover, it provides valuable insights and references for trajectory anomaly detection in other domains, holding both theoretical and practical importance.
09:00	Linfang Yu, Hao Wang, Yuxin He and Yang Wen Traffic Data Recovery and Outlier Detection based on Non-Negative Matrix Factorization and Truncated-Quadratic Loss Function ABSTRACT. Intelligent Transportation System (ITS) plays a critical role in managing traffic flow and ensuring safe transportation. However, the presence of missing and corrupted traffic data may undermine the accuracy and reliability of the system. The problem of recovering traffic data can often be transformed into a low-rank matrix factorization problem by exploiting the intrinsic low-rank characteristics of the traffic matrix. While many existing methods demonstrate excellent recovery performance under the assumption of noiseless or Gaussian noise, they often exhibit suboptimal performance in the presence of outliers. In this paper, we propose a novel method for recovering traffic data using non-negative matrix factorization with a truncated-quadratic loss function. Although the objective function in our model is non-convex and non-smooth, we convert it to a convex formulation using half-quadratic theory. Then, a solver based on block coordinate descent is developed. Our experiments on real-world traffic datasets demonstrate superior performance compared to state-of-the-art methods.
09:15	Guangyuan Pan, Zhiyuan Bai, Liping Fu, Lin Zhao and Qingguo Xiao Road Surface Segmentation and Detection Under Extreme Weather Conditions Based on Mask-RCNN ABSTRACT. The recognition effect of road surface conditions is a crucial factor that impacts traffic safety management and control. However, conventional intelligent models face challenges in accurately detecting road surfaces during extreme weather, which significantly hampers the recognition effect. This paper proposes an intelligent detection model for detecting road surfaces under extreme weather conditions. The model uses a Mask-RCNN architecture with Resnet101 and Feature Pyramid Network layers as its backbone network. In addition, we present a detailed study of the parameter training method for the model. This enables the design of a complete process for intelligent road surface detection in severe weather. The experimental results demonstrate that the proposed model scan accurately detect road surfaces, ultimately improving the recognition accuracy of road surface conditions. The processing method eliminates interference in the image and subsequently utilizes the proposed model to conduct object detection and segmentation on the processed image.

Chairs:

Fang Kong and Xinyan Su

08:30-10:00 Session ThuE1: Computational Intelligence

08:30	Xinyan Su, Jiyan Qiu, Zhiheng Zhang and Jun Li Causal-Inspired Influence Maximization in Hypergraphs Under Temporal Constraints ABSTRACT. Influence Maximization is a significant problem aimed to find a set of seed nodes to maximize the spread of given events in social networks. Previous studies are contributing to the efficiency and online dynamics of basic IM on classical graph structure. However, they lack an adequate consideration of individual and group behavior on propagation probability. This can be attributed to inadequate attention given to node \textbf{I}ndividual \textbf{T}reatment \textbf{E}ffects (ITE), which significantly divides the sensitive attributes of nodes significantly and impacts the probability of propagation. Additionally, in this process, research on temporal constraints based influence spreading process under higher order interference on hypergraphs is limited. To fill these two gaps, we introduce two sets of basic assumptions about the impact of ITE on the propagation process and develop a new diffusion model: the Latency Aware Contact Process on Causal Independent Cascading (LT-CPCIC) under time constraints on hypergraphs. We then design Causal-Inspired Cost-Effective Balanced Selection algorithm (CICEB) for the proposed models. CICEB first recovers node ITE from observational data and then uses three types of debiasing strategys to weaken the correlation between the propagation effects of different pre- and post-nodes. Finally, we compare CICEB with traditional methods on two real-world datasets and show that it demonstrates better performance on effectiveness and robustness.
08:45	Durgesh Samariya, Jiangang Ma, Sunil Aryal and Xiaohui Zhao Detection of Anomalies and Explanation in Cybersecurity ABSTRACT. Histogram-based anomaly detectors have gained significant attention and application in the field of intrusion detection because of the high efficiency in identifying anomalous patterns. However, they fail to explain why a given data point is flagged as an anomaly. Outlying aspect mining aims to detect aspects (a.k.a subspaces) where a given anomaly significantly differs from others. In this paper, we have proposed a simple but effective and efficient solution – HMass. In addition to detecting anomalies, HMass provides explanations on why the points are anomalous. The effectiveness and efficiency of HMass are evaluated using comparative analysis on seven cyber security datasets, covering the tasks of anomaly detection and outlying aspect mining.
09:00	Madhurima Panja, Tanujit Chakraborty, Uttam Kumar and Abdenour Hadid Probabilistic AutoRegressive Neural Networks for Accurate Long-range Forecasting ABSTRACT. Forecasting time series data is a critical area of research with applications spanning from stock prices to early epidemic prediction. While numerous statistical and machine learning methods have been proposed, real-life prediction problems often require hybrid solutions that bridge classical forecasting approaches and modern neural network models. In this study, we introduce the Probabilistic AutoRegressive Neural Networks (PARNN), capable of handling complex time series data exhibiting non-stationarity, nonlinearity, non-seasonality, long-range dependence, and chaotic patterns. PARNN is constructed by improving autoregressive neural networks (ARNN) using autoregressive integrated moving average (ARIMA) feedback error, combining the explainability, scalability, and "white-box-like" prediction behavior of both models. Notably, the PARNN model provides uncertainty quantification through prediction intervals, setting it apart from advanced deep learning tools. Through comprehensive computational experiments, we evaluate the performance of PARNN against standard statistical, machine learning, and deep learning models, including Transformers, NBeats, and DeepAR. Diverse real-world datasets from macroeconomics, tourism, epidemiology, and other domains are employed for short-term, medium-term, and long-term forecasting evaluations. Our results demonstrate the superiority of PARNN across various forecast horizons, surpassing the state-of-the-art forecasters. The proposed PARNN model offers a valuable hybrid solution for accurate long-range forecasting. By effectively capturing the complexities present in time series data, it outperforms existing methods in terms of accuracy and reliability. The ability to quantify uncertainty through prediction intervals further enhances the model's usefulness in decision-making processes.
09:15	Qingqing Li and Fang Kong FEGI: A Fusion Extractive-Generative Model for Dialogue Ellipsis and Coreference Integrated Resolution ABSTRACT. Dialogue systems in open domain have achieved great success due to the easily obtained single-turn corpus and the development of deep learning, but the multi-turn scenario is still a challenge because of the frequent coreference and information omission. In this paper, we aim to quickly retrieve the omitted or coreferred expressions contained in history dialogue and restore them into the incomplete utterance. Jointly inspired by the generative method for text generation and extractive method for span extraction, we propose a fusion extractive-generative dialogue ellipsis and coreference integrated resolution model(FEGI). In detail, we introduce two training tasks OMIT and SPAN to extract missing semantic expressions, then integrate the expressions obtained into the decoding initial and copy stages of the generative model respectively. To support the training tasks, we introduce an algorithm for secondary reconstruction annotation based on existing publicly available corpora via unsupervised technique, which can work in cases of no annotation of the missing semantic expressions. Moreover, We conduct dozens of joint learning experiments on the CamRest676 and RiSAWOZ datasets. Experimental results show that our proposed model significantly outperforms the state-of-the-art models in terms of quality.
09:30	Jianfu Li and Xu Li POI Recommendation based on Double-level Spatio-temporal Relationship in Locations and Categories ABSTRACT. The sparsity of user check-in trajectory data is a great challenge faced by point of interest(POI) recommendation. To alleviate the data sparsity, existing research often utilizes the geographic and time information in check-in trajectory data to discover the hidden spatio-temporal relations. However, existing models only consider the spatio-temporal relationship between locations, ignoring that between POI categories. To further reduce the negative impact of data sparsity, motivated by the method to integrate the spatio-temporal relationship by attention mechanism in LSTPM, this paper proposes a POI recommendation model based on double-level spatio-temporal relationship in locations and categories-POI2TS. POI2TS integrates the spatio-temporal relationship between locations and that between categories through attention mechanism to more accurately capture users' preferences. The test results on the NYC and TKY datasets show that POI2TS is more accurate compared with the state-of-the-art models, which verifies that integrating the spatio-temporal relationship between locations and that between categories can effectively improve POI recommendation models.
09:45	Yuchen Pan, Yuling Xue, Jun Li and Jianhua Xu Label Selection Algorithm Based on Ant Colony Optimization and Reinforcement Learning for Multi-label Classification ABSTRACT. Multi-label classification handles scenarios where each instance can be annotated with multiple non-exclusive but semantically related labels simultaneously. Despite significant progress, multi-label classification is still challenging due to the emergence of multiple applications leading to high-dimensional label spaces. Researchers have applied feature dimensionality reduction techniques to label space by using label correlation information, and obtained two techniques: label embedding and label selection. There have been many successful algorithms in the field of label embedding, but less attention has been paid to label selection. In this paper, we propose a label selection algorithm for multi-label classification: LS-AntRL. LS-AntRL is a label selection method that combines ant colony optimization and reinforcement learning. This method helps ant colony algorithms search better in the search space by using temporal difference (TD) reinforcement learning algorithm to learn directly from the experience of ants. For heuristic learning, we need to model the ant colony optimization problem as a reinforcement learning problem, that is, model label selection as a Markov decision process, where the label represents the state, and each ant selects unvisited labels represents a set of actions. The state transition rules of the ant colony optimization algorithm constitute the transition function in the Markov decision process, and the state value function is updated by TD formula to form a heuristic function in ant colony optimization. After performing label selection, we train a binary weighted neural network to recover low-dimensional label space back to the original label space. We apply the above model to five benchmark datasets with more than 100 labels. Experimental results show that our method achieves better classification performance than other advanced methods in terms of two performance evaluation metrics (Precision@n and DCG@n).

Chairs:

Jianyong Chen and Wael Korani

Location: Fuzhou Ting (福州厅)

08:30	Ji Zhao, Jiaming Liu, Qiang Li, Lingli Tang and Hongbin Zhang Nonlinear Multiple-delay Feedback Based Kernel Least Mean Square Algorithm ABSTRACT. In this paper, a novel algorithm called nonlinear multiple-delay feedback kernel least mean square (NMDF-KLMS) is proposed by introducing a nonlinear multiple-delay into the framework of multikernel adaptive filtering. The proposed algorithm incorporates the nonlinear multiple-delay to enhance the filtering performance in comparison with the kernel adaptive filtering algorithm using linear feedback. Furthermore, for NMDF-KLMS, the theoretical mean-square convergence analysis is also conducted. Simulation results under chaotic time-series prediction and real-world data applications show that NMDF-KLMS achieves a faster convergence rate and superior filtering accuracy.
08:45	Wael Korani Binary Mother Tree Optimization Algorithm for 0/1 Knapsack Problem ABSTRACT. The knapsack problem is a well-known strongly NP-complete problem where the profits of collection of items in knapsack is maximized under a certain weight capacity constraint. In this paper, a novel Binary Mother Tree Optimization Algorithm (BMTO) and Knapsack Problem Framework (KPF) are proposed to find an efficient solution for 0/1 knapsack problem in a short time. The proposed BMTO method is built on the original MTO and a binary module to solve an optimization problem in a discrete space. The binary module converts a set of real numbers equal to the dimension of the knapsack problem to a binary number using a threshold and the sigmoid function. In fact, the KPF makes the implementation of a metaheuristic algorithm to solve the knapsack problem much simpler. In order to assess the performance of the proposed solutions, extensive experiments are conducted. In this regard, several statistical analyses on the resulting solution are evaluated when solved for two sets of knapsack instances (small and large scale). The results demonstrate that BMTO can produce an efficient solution for knapsack instances of different sizes in a short time, and it outperforms two other algorithms Binary Particle Swarm Optimization (BPSO) and Binary Bacterial Foraging (BBF) algorithms in terms of best solution and time. In addition, the results of BPSO and BBF show the effectiveness of KPF compared to the results in the literature.
09:00	Yassine Saoudi and Mohamed Mohsen Gammoudi A Comprehensive Review of Arabic Question Answering Datasets ABSTRACT. The research community has shown significant interest in the field of Question Answering (QA) due to the strong relevance of QA applications. In recent years, there has been a significant increase in the availability of publicly accessible datasets aimed at advancing research in Arabic QA systems. This survey aims to identify, summarize, and analyze current Arabic QA datasets, such as Monolingual, Multilingual, and Cross-lingual. Our research surveys the existing datasets and provides a comprehensive and multi-faceted classification. Furthermore, this study aims to guide research in Arabic QA by providing the latest updates about the state-of-the-art in this field and identifying shortcomings in the current datasets to develop more substantial and improved collections. Finally, we discuss the existing challenges in Arabic QA datasets and highlight their potential benefits for future research.
09:15	Chufeng Li and Jianyong Chen An End-To-End Structure with novel position mechanism and improved EMD for Stock Forecasting ABSTRACT. As a branch of time series forecasting, stock movement forecasting is one of the challenging problems for investors and researchers. Since Transformer was introduced to analyze financial data, many researchers have dedicated themselves to forecasting stock movement using the Transformer or attention mechanisms. However, existing research mostly focuses on individual stock information but ignores stock market information and high noise in stock data. In this paper, we propose a novel method using the attention mechanism in which both stock market information and individual stock information are considered. Meanwhile, we propose a novel EMD-based algorithm for reducing short-term noise in stock data. Two randomly selected exchange-traded funds (ETFs) spanning over ten years from US stock markets are used to demonstrate the superior performance of the proposed attention-based method. The experimental analysis demonstrates that the proposed attention-based method significantly outperforms other state-of-the-art baselines.
09:30	Gary Sum, John Sum, Andrew Chi Sing Leung and Janet Chang A Distributed kWTA for Decentralized Auctions ABSTRACT. In a recent paper, a distributed $k$WTA model has been introduced, in which the recurrent connections is defined as a Laplacian matrix and the neuronal function is defined as a Heaviside function. While the recent model is defined as a continuous-time model, this paper presents a discrete-time version of the model and the conditions for ensuring correct output and finite-time convergent are shown. Thus, we introduce the application of the discrete-time distributed $k$WTA as a decentralized mechanism for auctions. Technical problems regarding its actual implementation are outlined.
09:45	Jielin Zeng, Jiaqi Liang, Xiaoxuan Wang, Linjing Li and Daniel Zeng A Two-Stage Active Learning Algorithm for NLP Based on Feature Mixing ABSTRACT. Active learning (AL) aims to improve the model performance with minimal data annotation. While recent AL studies have utilized feature mixing to identify unlabeled instances with novel features, applying it to natural language processing (NLP) tasks has been challenged due to the discrete nature of text tokens and the limited contribution of some novel features. To address these issues, we propose a two-stage acquisition method based on feature mixing for NLP tasks. We first create a mixed feature for both labeled and unlabeled instances to identify the features in the unlabeled instances that the model cannot recognize. Next, we evaluate the contribution of these novel features to the model using the entropy of the nearest labeled neighbors. The proposed method enables the model to select the most informative samples in the unlabeled sample pool. Experiments on sentiment analysis, topic classification, and natural language inference validated that our method not only outperforms other AL approaches but improves the efficiency of batch data acquisition.

08:30-10:00 Session ThuF1: Pattern Recognition 1

Chairs:

Yifan Yang and Jing Zhao

08:30-10:00 Session ThuG1: Image Processing & Computer Vision 2

08:30	Yongyu Wang and Yuxuan Song Accelerate Support Vector Clustering via Spectral Data Compression ABSTRACT. This paper proposes a novel framework for accelerating support vector clustering. The proposed method first computes much smaller compressed data sets while preserving the key cluster properties of the original data sets based on a novel spectral data compression approach. Then, the resultant spectrally-compressed data sets are leveraged for the development of fast and high quality algorithm for support vector clustering. We conducted extensive experiments using real-world data sets and obtained very promising results. The proposed method allows us to achieve 100X and 115X speedups over the state of the art SVC method on the Pendigits and USPS data sets, respectively, while achieving even better clustering quality. To the best of our knowledge, this represents the first practical method for high-quality and fast SVC on large-scale real-world data sets.
08:45	Qi Zhao, Binghao Liu, Shuchang Lyu, Chunlei Wang and Yifan Yang Enhancing Spatial Consistency and Class-level Diversity for Segmenting Fine-grained Objects ABSTRACT. Semantic segmentation is a fundamental computer vision task attracting a lot of attention. However, limited works focus on semantic segmentation on fine-grained class scenario, which has more classes and greater inter-class similarity. Due to the lack of data available for this task, we establish two segmentation benchmarks, CUB-seg and FGSCR42-seg, based on CUB and FGSCR42 datasets. To solve the two major problems in this task, spatial inconsistency and extremely similar classes confusion, we propose the Spatial Consistency and Class-level Diversity enhancement Network. First, we build the Spatial Consistency Enhancement Module to take advantage of the low-frequency information in the feature, enhancing the spatial consistency. Second, Fine-grained Regions Contrastive Loss is designed to make the features of different classes more discriminative, promoting the class-level diversity. Extensive experiments show that our method can significantly improve the performance compared to baseline models. Visualization study also prove the effectiveness of our method for enhancing spatial consistency and class-level diversity.
09:00	Shanhu Wang, Jing Zhao and Shiliang Sun Effective Domain Adaptation for Robust Dysarthric Speech Recognition ABSTRACT. By transferring knowledge from abundant normal speech to limited dysarthric speech, dysarthric speech recognition (DSR) has witnessed significant progress. However, existing adaptation techniques mainly focus on the full leverage of normal speech, discarding the sparse nature of dysarthric speech, which poses a great challenge for DSR training in low-resource scenarios. In this paper, we present an effective domain adaptation framework to build robust DSR systems with scarce target data. Joint data preprocessing strategy is employed to alleviate the sparsity of dysarthric speech and close the gap between source and target domains. To enhance the adaptability of dysarthric speakers across different severity levels, the Domain-adapted Transformer model is devised to learn both domain-invariant and domain-specific features. All experimental results demonstrate that the proposed methods achieve impressive performance on both speaker-dependent and speaker-independent DSR tasks. Particularly, even with half of the target training data, our DSR systems still maintain high accuracy on speakers with severe dysarthria.
09:15	Jiming Yan, Hongbo Wang and Xinchen Liu An Adaptive Detector for Few Shot Object Detection ABSTRACT. Few-shot object detection has made progress in recent years. However, most research assumes that base and new classes come from the same domain. In real-world applications, they often come from different domains, resulting in poor adaptability of existing methods. To address this problem, we designed an adaptive few-shot object detection framework. Based on the Meta R-CNN framework, we added an image domain classifier after the backbone’s last layer to reduce domain discrepancy. To avoid class feature confusion caused by image feature distribution alignment, we also added a feature filter module (CAFFM) to filter out features irrelevant to specific classes. We tested our method on three base/new splits and found significant performance improvements compared to the base model Meta R-CNN. In base/new split2, mAP50 increased by ±8%, and in the remaining two splits, mAP50 improved by ±3%. Our method outperforms state-of-the-art methods in most cases for the three different base/new splits, validating the efficacy and generality of our approach
09:30	Xiaoyang Liu, Zhigang Zeng and Rusheng Ju Design of Memristor-based Binarized Multi-layer Neural Network with High Robustness ABSTRACT. Memristor-based neural networks are promising to alleviate the bottleneck of neuromorphic computing devices based on the von Neumann architecture. Various memristor-based neural networks, which are built with different memristor-based layers, have been proposed in recent years. But the Memristor-based neural networks with full precise weight values are effected by memristor conductance variations which have negative impacts on the performance. However, binarized neural networks only have two kinds of weight states, and the binzaized neural networks built by memristors suffer little from the memristor conductance variations. In this paper, a memristor-based batch normalization layer and a binarized fully connected layer are designed. Then based on the proposed layers, the memristor-based binarized multi-layer neural network is built. The effectiveness of the network is substantiated through simulation experiments on pattern classification tasks. The robustness of the network is also explored and the results show that network has high robustness to the variations.
09:45	Deeptiraj G, Dr.Prabadevi B, Thippa Reddy Gadekallu, Surbhi Bhatia Khan and Mohammed Saraee Multiclass Classification and Defect Detection of Steel tube using modified YOLO ABSTRACT. Steel tubes are widely used in hazardous high pressure envi- ronments such as petroleum, chemicals, natural gas and shale gas. De- fects in steel tubes have serious negative consequences. Using deep learn- ing object recognition to identify and detect defects can greatly improve inspection efficiency and drive industrial automation. In this work, we use a well-known YOLOv7(You Only Look Once version7) deep learning model and propose to improve it to achieve accurate defects detection of steel tube images. First, the classification of the dataset is checked using a sequential model and AlexNet. A Coordinate Attention (CA) mechanism is then integrated into the YOLOv7 backbone network to improve the expressive power of the feature graph. Additionally, the SIoU (SCYLLA- Intersection over Union) loss function is used to speed up convergence due to class imbalance in the dataset. Experimental results show that the evaluation index of the optimized and modified YOLOv7 algorithm outperforms other models. This study demonstrates the effectiveness of using this method in improving the model’s detection performance and providing a more effective solution to steel tube defects.

Chairs:

Chak Fong Chong and Bing Li

10:20-11:50 Session ThuA2: Federated Learning for Industry 5.0 (Invited Session)

08:30	Bing Li, Feng Li, Shaokun Gao, Qile Fan, Yuchen Lu, Renyu Hu and Zhiyuan Zhao Efficient Prompt Tuning for Vision and Language Models ABSTRACT. Recently, large-scale pre-trained visual language models have demonstrated excellent performance in many downstream tasks. A more efficient adapta-tion method for different downstream tasks is prompt tuning, which fixes the parameters of the visual language model and adjusts only prompt pa-rameters in the process of adapting the downstream tasks, using the knowledge learned by the visual language model during pre-training to solve the problems in the down-stream tasks. However, the loss of the downstream task and the original loss of the visual language model are not exactly same during model training. For example, CLIP uses contrast learn-ing loss to train the model, while the downstream image classification task uses the cross-entropy loss commonly used in classification problems. Dif-ferent loss has different guiding effects on the task. The trend of the accura-cy of the visual language model task during training is also different from that with the downstream task. The choice of an appropriate loss function and a reasonable prompt tuning method have a great impact on the perfor-mance of the model. Therefore, we pro-pose a more efficient method of prompt tuning for CLIP, experiments on 11 datasets demonstrate that our method achieves better performance and faster convergence in the down-stream task.
08:45	Chak Fong Chong, Xu Yang, Tenglong Wang, Wei Ke and Yapeng Wang Category-wise Fine-Tuning for Image Multi-label Classification with Partial Labels ABSTRACT. Image multi-label classification datasets are often partially labeled (for each sample, only the labels on some categories are known). One popular solution for training convolutional neural networks is treating all unknown labels as negative labels, named Negative mode. But it produces wrong labels unevenly over categories, decreasing the binary classification performance on different categories to varying degrees. On the other hand, although Ignore mode that ignores the contributions of unknown labels may be less effective than Negative mode, it ensures the data have no additional wrong labels, which is what Negative mode lacks. In this paper, we propose Category-wise Fine-Tuning (CFT), a new post-training method that can be applied to a model trained with Negative mode to improve its performance on each category independently. Specifically, CFT uses Ignore mode to one-by-one fine-tune the logistic regressions (LRs) in the classification layer. The use of Ignore mode reduces the performance decreases caused by the wrong labels of Negative mode during training. Particularly, Genetic Algorithm (GA) and binary crossentropy are used in CFT for fine-tuning the LRs. The effectiveness of our methods was evaluated on the CheXpert competition dataset and achieves state-of-the-art results, to our knowledge. A single model submitted to the competition server for the official evaluation achieves mAUC 91.82% on the test set, which is the highest single model score in the leaderboard and literature. Moreover, our ensemble achieves mAUC 93.33% on the test set, superior to the best in the leaderboard and literature (93.05%). Besides, the effectiveness of our methods is also evaluated on the partially labeled versions of the MS-COCO dataset.
09:00	Zhengyin Liang and Xili Wang Semantic Segmentation of Multispectral Remote Sensing Images with Class Imbalance Using Contrastive Learning ABSTRACT. Affected by the distribution differences of ground objects, multispectral remote sensing images are characterized by long-tailed distribution, that is, a few classes (head classes) contain many instances, while most classes (tail classes or called rare classes) contain only a few instances. The class imbalanced data brings a great challenge to the semantic segmentation task of multispectral remote sensing images. To conquer this problem, this paper proposes a novel contrastive learning method (CoLM) for semantic segmentation of multispectral remote sensing images with class imbalance. Firstly, we propose a semantic consistency constraint to maximize the similarity of semantic feature embeddings of the same class in the feature space, then a rebalancing sampling strategy is proposed to dynamically select the hard-to-predict samples in each class as anchor samples to impose additional supervision, and use pixel-level supervised contrastive loss to improve the separability of rare classes in the decision space. The experimental results on two long-tailed remote sensing datasets show that our method can be easily integrated into existing segmentation models, effectively improving the segmentation accuracy of rare classes without increasing additional inference costs.
09:15	Zhixiang Zhang and Shan Jiang Curve Enhancement: A No-Reference Method for Low-light Image Enhancement ABSTRACT. In this paper, we introduce an end-to-end method for enhancing low-light images without relying on paired datasets. Our solution is reference-free and unsupervised, addressing the lack of real-world low-light paired datasets effectively. Specifically, we design a Brightness Boost Curve (BB-Curve) that enhances the brightness of image pixels in a finely mapped form. Additionally, we propose a lightweight deep neural network that can estimate the curve parameters and evaluate the quality of the enhanced images using a series of no-reference loss functions. We validate our method through experiments conducted on several datasets and provide both subjective and quantitative evaluations to demonstrate its significant brightness enhancement capabilities, free from smearing and artifacts. Notably, our approach displays a strong ability to generalize while retaining details that are crucial for image interpretation. With the reduced network structure and simple curve mapping, our model yields superior training speed and the best prediction performance among comparative methods.
09:30	Jun Shang, Wenxin Yu, Lu Che, Zhiqiang Zhang, Hongjie Cai, Zhiyu Deng, Jun Gong and Peng Chen Text-to-Image Synthesis With Threshold-Equipped Matching-Aware GAN ABSTRACT. In this paper, we propose a novel Equipped with Threshold Matching-Aware Generative Adversarial Network (ETMA-GAN) for text-to-image synthesis. By filtering inaccurate negative samples, the discriminator can more accurately determine whether the generator has generated the images correctly according to the descriptions. In addition, to enhance the discriminative model's ability to discriminate and capture key semantic information, a word fine-grained supervisor is constructed, which in turn drives the generative model to achieve high-quality image detail synthesis. Numerous experiments and ablation studies on Caltech-UCSD Birds 200 (CUB) and Microsoft Common Objects in Context (MS COCO) datasets demonstrate the effectiveness and superiority of the proposed method over existing methods. In terms of subjective and objective evaluations, the model presented in this study has more advantages than the recently available state-of-the-art methods, especially regarding synthetic images with a higher degree of realism and better conformity to text descriptions.
09:45	Tanmay Singha, Duc-Son Pham and Aneesh Krishna Effi-Seg: Rethinking EfficientNet Architecture for Real-time Semantic Segmentation ABSTRACT. A popular strategy for designing a semantic segmentation model is to utilize a well-established pre-trained Deep Convolutional Neural Network (DCNN) as a feature extractor and replace the classification head with a decoder to generate segmented outputs. The advantage of this strategy is the ability to obtain a ready-made backbone with additional knowledge. However, there are several disadvantages, such as a lack of architectural knowledge, a significant semantic gap among the deep feature maps, and a lack of control over architectural changes to reduce memory overhead. To overcome these issues, we first study the complete architecture of EfficientNetV1 and EfficientNetV2, analyzing the architectural and performance gaps. Based on this analysis, we develop an efficient segmentation model called Effi-Seg by implementing several architectural changes to the backbone. This approach leads to better semantic segmentation results with improved efficiency. To enhance contextualization and achieve accurate object localization in the scene, we introduce the feature refinement (FRM) and semantic aggregation module (SAM) at the decoder side. The complete segmentation network comprises only 1.49 million parameters and 8.4 GFLOPs. We evaluate the performance of the proposed model using three popular benchmarks, and it demonstrates highly competitive results on all three datasets while maintaining excellent efficiency.

Chairs:

Thippa Reddy Gadekallu, Gautam Srivastava, Nancy Victor and Wei Wang

10:20-11:50 Session ThuB2: AI-based Multimodal Data Analysis for Medical and Health Applications (Invited Session)

10:20	Satheesh Abimannan, El-Sayed M. El-Alfy, Saurabh Shukla and Dhivyadharsini Satheesh Spatiotemporal PM2.5 Pollution Prediction Using Cloud-Edge Intelligence ABSTRACT. This study introduces a novel spatiotemporal method to predict fine dust (or PM2.5) concentration levels in the air, a significant environmental and health challenge, particularly in urban and industrial locales. We capitalize on the power of AI-powered Edge Computing and Federated Learning, applying historical data spanning from 2018 to 2022 collected from four strategic sites in Mumbai: Kurla, Bandra-Kurla, Nerul, and Sector-19a-Nerul. These locations are known for high industrial activity and heavy traffic, contributing to increased pollution exposure. Our spatiotemporal model integrates the strengths of Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks, with the goal to predict PM2.5 concentrations 24 hours into the future. Other machine learning algorithms, namely Support Vector Regression (SVR), Gated Recurrent Units (GRU), and Bidirectional LSTM (BiLSTM), were evaluated within the Federated Learning framework. Performance was assessed using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R^2. The preliminary findings suggest that our CNN-LSTM model outperforms the alternatives, with a MAE of 0.466, RMSE of 0.522, and R^2 of 0.9877.
10:35	Han Zhu, Huibin Wang, Chan-Tong Lam, Liyazhou Hu, Benjamin K. Ng and Kai Fang Rapid APT Detection in Resource-Constrained IoT Devices Using Global Vision Federated Learning (GV-FL) ABSTRACT. The increasing proliferation of the Internet of Things (IoT) devices in industrial applications present unique challenges in cybersecurity, specifically in detecting Advanced Persistent Threats (APTs). The limitations of traditional IoT devices, such as low computational power, further exacerbate these challenges. This paper proposes a novel approach to this problem, Global Vision Federated Learning (GV-FL), which leverages FL for efficient and effective APT detection in resource-constrained IoT devices. We comprehensively analyze APT attacks and their stealthy characteristics and highlight the shortcomings of existing detection methods. The GV-FL method presented in this work offers a unique solution by providing a global perspective of the IoT network, thus enabling rapid detection of APTs even in devices with limited resources. Our experimental evaluation demonstrates that GV-FL not only outperforms existing solutions in terms of detection accuracy and speed but also significantly reduces resource consumption, thus proving to be a promising approach to APT detection in IoT devices. We conclude by exploring potential future work and improvements to the GV-FL algorithm, setting the stage for a new paradigm in IoT cybersecurity.
10:50	Supriya Y, Gautam Srivastava, Dasaradharami Reddy K, Gokul Yenduri, Nancy Victor, Anusha S and Thippa Reddy Gadekallu PSO-enabled Federated Learning for detecting ships in supply chain management ABSTRACT. Supply chain management plays a vital role in the efficient and reliable movement of goods across various platforms, which involves several entities and processes. Detecting ships and their related activities is of paramount importance in order to ensure successful logistics and security. In order to improve logistics planning, security, and risk management, a strong framework is required that offers an efficient and privacy-preserving solution for identifying ships in supply chain management. In this paper, we propose a novel approach called PSO-enabled FL (PSO-FL) for ship detection in supply chain management. The proposed PSO-FL framework leverages the advantages of both FL and PSO to address the challenges of ship detection in supply chain management. We can train a ship identification model cooperatively using data from several supply chain stakeholders, including port authorities, shipping firms, and customs agencies, thanks to the distributed nature of federated learning (FL). By improving the choice of appropriate participants for model training, the PSO algorithm improves FL performance. We conduct extensive experiments using real-world ship data that is gathered from various sources in order to evaluate the effectiveness of our PSO-FL approach. The results demonstrate that our framework achieves superior ship detection accuracy of 94.88\% compared to traditional centralized learning approaches and standalone FL methods. Furthermore, the PSO-FL framework demonstrates robustness, scalability, and privacy preservation, making it suitable for large-scale deployment in complex supply chain management scenarios.
11:05	Sonali Samal, Gautam Srivastava, Thippa Reddy Gadekallu, Yu-Dong Zhang and Bunil Kumar Balabantara LSiF: Log-Gabor Empowered Siamese Federated Learning for Efficient Obscene Image Classification in the Era of Industry 5.0 ABSTRACT. The widespread presence of explicit content on social media platforms has far-reaching consequences for individuals, relationships, and society as a whole. It is essential to address this issue through effective content moderation, user education, and the development of technologies and policies to promote a safer and healthier online environment. To address this issue, this research proposed the Log-Gabor Empowered Siamese Federated Learning (LSiF) framework for the precise and efficient classification of obscene images in the era of industry 5.0. The LSiF framework utilizes a siamese network with two parallel streams, where log-Gabor input and normal raw input are processed simultaneously. This siamese architecture leverages shared weights and parameters, enabling effective learning of distinctive features for class differentiation and pattern recognition. The weight-sharing mechanism enhances the model's ability to generalize, increases its robustness, and improves computational efficiency, making it well-suited for resource-constrained and real-time applications. Additionally, federated learning is employed with a client size of three, allowing local model updates on each device. This approach minimizes the need for extensive data transmission to a central server, reducing communication overhead and improving learning efficiency, particularly in environments with limited bandwidth. The proposed LSiF model demonstrates remarkable performance, achieving an accuracy of 94.30\%, precision of 94.00\%, recall of 94.26\%, and F1-Score of 94.17\% with a client size of three.
11:20	Hongyu Huang, Cui Sun, Xinyu Lei, Nankun Mu, Chunqiang Hu, Chao Chen, Huaqing Li and Yantao Li Privacy-Preserving Travel Time Prediction for Internet of Vehicles: A Crowdsensing and Federated Learning Approach ABSTRACT. Travel time prediction (TTP) is an important module task to support various applications of Internet of Vehicles (IoVs). Although TTP has been widely investigated in existing literature, most of them assume that the traffic data for estimating the travel time are comprehensive and public for free. However, accurate TTP needs realtime vehicular data so that the prediction can be adaptive to the traffic changes. Moreover, since realtime data contain vehicles' privacy, TTP requires protection during data processing. In this paper, we propose a novel Privacy-Preserving Travel Time Prediction mechanism for IoVs, PTPrediction, building on crowdsensing and federated learning. In the crowdsensing paradigm, a data curator continually collects traffic data from vehicles for travel time prediction. To protect the vehicle's privacy, we make use of federated learning so that vehicles can help the data curator train the prediction model without revealing their original data. We also design a spatial prefix encoding method to protect vehicles' location information, along with a ciphertext-policy attribute-based encryption (CP-ABE) mechanism to protect the prediction model of the curator. We evaluate PTPrediction in terms of MAE, MSE, RMSE on real-world traffic datasets. The experimental results show that our mechanism has higher prediction accuracy and stronger privacy protection comparing with existing methods.

Chairs:

Md Zakir Hossain, Syed Islam, Asim Iqbal and Tom Gedeon

Location: Hefei Ting (合肥厅)

10:20	Zhang Guiqiang, Shi Jianting, Liu Wenqiang, Zhang Guifu and He Yuanhan Research on automatic segmentation algorithm of brain tumor image based on multi-sequence self-supervised fusion in complex scenes ABSTRACT. Brain tumors play a crucial role in medical diagnosis and treatment planning. Extracting tumor information from MRI images is essential but can be challenging due to the limitations and intricacy of manual delineation. This paper presents a brain tumor image segmentation framework that addresses these challenges by leveraging multiple sequence information. The framework consists of encoder, decoder, and data fusion modules. The encoder incorporates Bi-ConvLSTM and Transformer models, enabling comprehensive utilization of both local and global details in each sequence. The decoder module employs a lightweight MLP architecture. Additionally, we propose a data fusion module that integrates self-supervised multi-sequence segmentation results. This module learns the weights of each sequence prediction result in an end-to-end manner, ensuring robust fusion results. Experimental validation on the BRATS 2018 dataset demonstrates the excellent performance of the proposed automatic segmentation framework for brain tumor images. Comparative analysis with other multi-sequence fusion segmentation models reveals that our framework achieves the highest Dice score in each region. To provide a more comprehensive background, it is important to highlight the significance of brain tumors in medical diagnosis and treatment planning. Brain tumors can have serious implications for patients, affecting their overall health and well-being. Accurate segmentation of brain tumors from MRI images is crucial for assessing tumor size, location, and characteristics, which in turn informs treatment decisions and prognosis. Currently, manual delineation of brain tumors from MRI images is a time-consuming and labor-intensive process prone to inter-observer variability. Automating this segmentation task using advanced image processing techniques can significantly improve efficiency and reliability.
10:35	Yunfei Tian, Chunyu Tan, Qiaoyun Wu and Yun Zhou EEG epileptic seizure classification using hybrid time-frequency attention deep network ABSTRACT. Epileptic seizure is a complex neurological disorder and is difficult to detect. Observing and analyzing the waveform changes of EEG signals is the main way to monitor epilepsy activity. However, due to the complexity and instability of EEG signals, the effectiveness of identifying epileptic region by previous methods using EEG signals is not very satisfactory. On the one hand, these methods use the initial time series directly, which reflect limited epilepsy related features; On the other hand, they do not fully consider the spatiotemporal dependence of EEG signals. This study proposes a novel epileptic seizure classification method using EEG based on a hybrid time-frequency attention deep network, namely, a time-frequency attention CNN-BiLSTM network (TFACBNet). TFACBNet firstly uses a time-frequency representation attention module to decompose the input EEG signals to obtain multiscale time-frequency features which provides seizure relevant information within the EEG signals. Then, a hybrid deep network combining convolutional neural network (CNN) and bidirectional LSTM (BiLSTM) architecture extracts spatiotemporal dependencies of EEG signals. Experimental studies have been performed on the benchmark database of the Bonn EEG dataset, achieving 98.84% accuracy on the three-category classification task and 92.35% accuracy on the five-category classification task. Our experimental results prove that the proposed TFACBNet achieves a state-of-the-art classification effect on epilepsy EEG signals.
10:50	Xueqing Fang, Zhan Li, Bin Yuan, Xinrui Wang, Zekai Jiang, Jianliang Zeng and Qingliang Chen Prior-Enhanced Network for Image-based PM2.5 Estimation from Imbalanced Data Distribution ABSTRACT. The effective monitoring of PM2.5, a major indicator of air pollution, is crucial to human activities. Compared to traditional physiochemical techniques, image-based methods train PM2.5 estimators by using datasets containing pairs of images and PM2.5 levels, which are efficient, economical, and convenient to deploy. However, existing methods either employ handcrafted features, which can be influenced by the image content, or require additional weather data acquired probably by laborious processes. To estimate the PM2.5 concentration from a single image without requiring extra data, we herein proposed a learning-based prior-enhanced (PE) network—comprising a main branch, an auxiliary branch, and a feature fusion attention module—to learn from an input image and its corresponding dark channel (DC) and inverted saturation (IS) maps. In addition, we proposed an histogram smoothing (HS) algorithm to solve the problem of imbalanced data distribution, thereby improving the estimation accuracy in cases of heavy air pollution. To the best of our knowledge, this study is the first to address the phenomenon of a data imbalance in imaged-based PM2.5 estimation. Finally, we constructed a new dataset containing multi-angle images and more than 30 types of air data. Extensive experiments on image-based PM2.5 monitoring datasets verified the superior performance of our proposed neural networks and the HS strategy. The new dataset and codes are available at https://github.com/xxx (open after publication).
11:05	Jiaming Li, Chuanqi Tao and Donghai Guan PMFNet: A Progressive Multichannel Fusion Network for Multimodal Sentiment Analysis ABSTRACT. The core of multimodal sentiment analysis is to find effective encoding and fusion methods to make accurate predictions. However, previous works ignore the problems caused by the sampling heterogeneity of modalities, and visual-audio fusion does not filter out noise and redundancy in a progressive manner. On the other hand, current deep learning approaches for multimodal fusion rely on single channel fusion (horizontal position/vertical space channel), and models of the human brain highlight the importance of multichannel fusion. In this paper, inspired by the perceptual mechanisms of the human brain in neuroscience, to overcome the above problems, we propose a novel framework named Progressive Multichannel Fusion Network (PMFNet) to meet the different processing needs of each modality and provide interaction and integration between modalities at different encoded representation densities, enabling them to be better encoded in a progressive manner and fused over multiple channels. Extensive experiments conducted on public datasets demonstrate that our method gains superior or comparable results to the state-of-the-art models.
11:20	Xiatian Zhang, Sisi Zheng, Hubert P. H. Shum, Haozheng Zhang, Nan Song, Mingkang Song and Hongxiao Jia Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI ABSTRACT. Resting-state fMRI (rs-fMRI) functional connectivity (FC) analysis provides valuable insights into the relationships between different brain regions and their potential implications for neurological or psychiatric disorders. However, specific design efforts to predict treatment response from rs-fMRI remain limited due to difficulties in understanding the current brain state and the underlying mechanisms driving the observed patterns, which limited the clinical application of rs-fMRI. To overcome that, we propose a graph learning framework that captures comprehensive features by integrating both correlation and distance-based similarity measures under a contrastive loss. This approach results in a more expressive framework that captures brain dynamic features at different scales and enables more accurate prediction of treatment response. Our experiments on the chronic pain and depersonalization disorder datasets demonstrate that our proposed method outperforms current methods in different scenarios. To the best of our knowledge, we are the first to explore the integration of distance-based and correlation-based neural similarity into graph learning for treatment response prediction.
11:35	Wlodzislaw Duch, Krzysztof Tołpa, Ewa Ratajczak, Marcin Hajnowski, Łukasz Furman and Luis Alexandre Asymptotic spatio-temporal averaging of the power of EEG signals for schizophrenia diagnostics ABSTRACT. Although many sophisticated EEG analysis methods have been developed, they are rarely used in clinical practice. Brain bioelectrical activity is non-stationary and characterized by high daily variations; individual differences are quite significant. Therefore, searching for simple methods that can provide stable results reflecting the basic characteristics of individual neurodynamics is very important. Here, we describe two methods potentially useful in schizophrenia diagnostics. We explore the potential for classification based on features extracted with the asymptotic spatial power distribution method and compare it with the results using microstate parameters and probabilities of transition between microstates. Applied to EEG data with only 16 channels and a low sampling rate, such methods provide quite good discrimination between adolescent schizophrenia patients and a control group of healthy teens.

10:20-11:50 Session ThuC2: Bioinformatics 1

Chairs:

Zuping Zhang and Yuang Zhang

10:20-11:50 Session ThuD2: Data Mining 3

10:20	Ali Roozbehi, Mahsa Mohaghegh and Vahid Reza Nafisi Non-Contact Respiratory Flow Extraction from Infrared Images Using Balanced Data Classification ABSTRACT. The COVID-19 pandemic has emphasized the need for non-contact ways of measuring vital signs. However, collecting respiratory signals can be challenging due to the transmission risk and physical discomfort of spirometry devices. This is problematic in places like schools and workplaces where monitoring health is crucial. Infrared fever meters are not accurate enough since fever is not the only symptom of these diseases. The objective of our study was to develop a non-contact method for obtaining Respiratory Flow (RF) from infrared images. We recorded infrared images of three subjects at a distance of 1 meter while they breathed through a spirometry device. We proposed a method called Balanced Data Classification to distribute frames equally into several classes and then used the DenseNet-121 Convolutional Neural Network Model to predict RF signals from the infrared images. Our results showed a high correlation of 97% and a Normalized Mean Absolute Error of 2.3%, which are significant compared to other studies. Our method is fully non-contact and involves standing at a distance of 1 meter from the subjects. In conclusion, our study demonstrates the feasibility of using infrared images to extract RF.
10:35	Yunhua Chen, Zhimin Xiong, Ren Feng, Jinsheng Xiao and Pinghua Chen RMPE:Reducing Residual Membrane Potential Error for Enabling High-accuracy and Ultra-low-latency Spiking Neural Networks ABSTRACT. Spiking neural networks (SNNs) have attracted great attention due to their distinctive properties of low power consumption and high computing efficiency on neuromorphic hardware. An effective way to obtain deep SNNs with competitive accuracy on large-scale datasets is ANN-SNN conversion. However, it requires a long time window to get an optimal mapping between the firing rates of SNNs and the activation of ANNs due to conversion error. Compared with the source ANN, the converted SNN usually suffers a huge loss of accuracy at ultra-low latency. In this paper, we first analyze the residual membrane potential error caused by the asynchronous transmission property of spikes at ultra-low latency, and deduce an explicit expression for the residual membrane potential error (RMPE) and the SNN parameters. Then we propose a layer-by-layer calibration algorithm for these SNN parameters to eliminate RMPE. Finally, a twostage ANN-SNN conversion scheme is proposed to eliminate the quantization error,the truncation error, and the RMPE separately. We evaluate our method on CIFAR and ImageNet, and the experimental results show that the proposed ANNSNN conversion method has a significant reduction in accuracy loss at Ultra-lowlatency. For ImageNet, when T is ≤ 64, the delay required by our method is about 1/2 of the other methods.
10:50	Mengxin Yu, Yuang Zhang, Haihui Liu, Xiaona Wu, Mingsen Du and Xiaojie Liu IEEG-CT: A CNN and Transformer Based Method for Intracranial EEG Signal Classification ABSTRACT. Intracranial electroencephalography (iEEG) is of great importance for the preoperative evaluation of drug-resistant epilepsy. Automatic classification of iEEG signals can speed up the process of epilepsy diagnosis. Existing deep learning-based approaches for iEEG signal classification usually rely on convolutional neural network (CNN) and long short-term memory network. However, these approaches have limitations in terms of classification accuracy. In this paper, we propose a CNN and Transformer based method, which is named as IEEG-CT, for iEEG signal classification. Firstly, IEEG-CT utilities deep one-dimensional CNN to extract the critical local features from the raw iEEG signals. Secondly, IEEG-CT combines a Transformer encoder, which leverages a multi-head attention mechanism to capture the long-range global information among the extracted features. In particular, we introduce a causal convolution multi-head attention instead of the standard Transformer block to efficiently capture the temporal relations in the input features. Finally, the obtained global features by Transformer encoder are employed for the classification. We evaluate the performance of IEEG-CT using two public multicentre iEEG dataset. The experimental results demonstrate that IEEG-CT outperforms state-of-the-art techniques in terms of various evaluation metrics, i.e., accuracy, AUROC and AUPRC.
11:05	Yan Zhang, Xin Liu, Panrui Tang and Zuping Zhang SLG-NET: Subgraph Neural Network with Local-Global Braingraph Feature Extraction Modules and a Novel Subgraph Generation Algorithm for Automated Identification of Major Depressive Disorder ABSTRACT. Major depressive disorder (MDD) is a severe mental illness that poses significant challenges to both society and families. Recently, several graph neural network (GNN)-based methods have been proposed for MDD diagnosis and achieved promising results. However, these methods encode entire braingraph directly, have overlooked the subgraph structure of braingraph, which leads to poor specificity to braingraphs. Additionally, the GNN framework they used is rudimentary, resulting in insufficient feature extraction capabilities. In light of the two shortcomings mentioned above, this paper designed a novel depression diagnosis framework named SLG-NET based on subgraph neural network. To the best of our knowledge, this study is the first attempt to apply subgraph neural network to the field of depression diagnosis. In order to enhance the specificity of our model to braingraphs, we propose a novel subgraph generation algorithm based on sub-structure information of brain. To improve feature extraction capabilities, a local and global braingraph feature extraction modules are proposed to extract braingraph properties at both local and global levels. Comprehensive experiments performed on rest-metamdd dataset show that the performance of proposed SLG-NET significantly surpasses many state-of-the-art methods with an accuracy of 74.15%, the high accuracy shows that the SLG-NET has the potential for auxiliary diagnosis of depression in clinical scenarios, we further analyze high-order FC network and highlight the hyperconnectivity of thalamus as key neurophysiological feature associated with MDD, which may guide the development of biomarkers used for the clinical diagnosis of MDD.
11:20	Sai Siddhartha Vivek Dhir Rangoju, Keshav Garg, Rohith Dandi, Om Prakash Patel and Neha Bharill Soybean Genome Clustering using Quantum-Based Fuzzy C-Means Algorithm ABSTRACT. Bioinformatics is a new area of research in which many computer scientists are working to extract some useful information from genome sequences in a very less time, whereas traditional methods may take years to fetch this. One of the studies that belong to the area of Bioinformatics is protein sequence analysis. In this study, we have considered the soybean protein sequence which does not have class information therefore clustering of these sequences is required. As these sequences are very complex and consist of overlapping sequences, therefore Fuzzy C-Means algorithm may work better than crisp clustering. However, the clustering of these sequences is a very time-consuming process also the results are not up to the mark by using existing crisp and fuzzy clustering algorithms. Therefore we propose here a quantum Fuzzy c-Means algorithm that uses the quantum computing concept to represent the dataset in the quantum form. The proposed approach also use the quantum superposition concept which fastens the process and also gives better result than the FCM algorithm.
11:35	Xinhong Li, Baoai Han, Li Xiao, Xiuping Yang, Weiping Tu, Xiong Chen, Weiyan Yi, Jie Lin, Yuhong Yang and Yanzhen Ren ONEI: Unveiling Route and Phase of Breathing from Snoring Sounds ABSTRACT. Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a chronic respiratory disorder caused by the obstruction of the upper airway. The treatment approach for OSAHS varies based on the individual patient's breathing route and phase during snoring. Extensive research has been conducted to identify various snoring patterns, including the breathing route and the breathing phase during snoring. However, the identification of breathing routes and phases during snoring sounds is still in the early stages due to the limited availability of comprehensive datasets with scientifically annotated nocturnal snoring sounds. To address this challenge, this study presents ONEI, an innovative dataset designed for recognizing and analyzing snoring patterns. ONEI encompasses 5171 snoring recordings and is annotated with four distinct labels, namely nasal-dominant inspiratory snoring, nasal-dominant expiratory snoring, oral inspiratory snoring, and oral expiratory snoring. Experimental evaluations reveal discernible acoustic features in snoring sounds, which can be effectively utilized for accurately identifying various snoring types in real-world scenarios. The dataset will be made publicly available for access at https://github.com/emleeee/ONEI.

Chairs:

Zhuonan Liang and Yuling Yang

10:20-11:50 Session ThuE2: Recommender Systems

10:20	Ming Li, Lin Li, Xiaohui Tao, Qing Xie and Jingling Yuan Category-wise Meal Recommendation ABSTRACT. Meal recommender system, as an application of bundle recommendation, aims to provide courses from specific categories (e.g., appetizer, main dish) that are enjoyed as a meal for a user. Existing bundle recommendation methods work on learning user preferences from user bundle interactions to satisfy users’ information need. However, users in food scenarios may have different preferences for different course categories. It is a challenge to effectively consider course category constraints when predicting meals for users. To this end, we propose a model CateRec: Category-wise meal Recommendation model. Specifically, our model first decomposes interactions and affiliations between users, meals, and courses according to category. Secondly, graph neural networks are utilized to learn category-wise user and meal representations. Then, the likelihood of user-meal interactions is estimated category by category. Finally, our model is trained by a category-wise enhanced Bayesian Personalized Ranking (BPR) loss. Conducted on two public datasets, our model outperforms state-of-the-art methods in terms of Recall@K and NDCG@K.
10:35	Mark Junjie Li, Rui Wang, Jun Li, Xianyu Bao, Jueying He, Jiayao Chen and Lijuan He Topic Modeling for Short Texts via Adaptive Pólya Urn Dirichlet Multinomial Mixture ABSTRACT. Inferring coherent and diverse latent topics from short texts is crucial in topic modeling. Existing approaches leverage the Generalized Pólya Urn (GPU) model to incorporate external knowledge and improve topic modeling performance. While the GPU scheme successfully promotes similarity among words within the same topic, it has two major limitations. Firstly, it assumes that similar words contribute equally to a similar topic, disregarding the distinctiveness of different words. Secondly, it assumes that a specific word should have the same promotion across all topics, overlooking the variations in word importance across different topics. To address these limitations, we propose the Adaptive P′olya Urn Dirichlet Multinomial Mixture (APU-DMM) model, which leverages global topic-word correlation to encourage adaptive weights for different words. This is achieved through a novel Adaptive Pólya Urn (APU) scheme. We conduct experiments on three datasets (Tweet, SearchSnippets, and GoogleNews), and the results demonstrate our model’s superiority in terms of topic coherence and topic diversity. This paper contributes to advancing latent topic inference in short texts by introducing the APU-DMM model and showcasing its enhanced performance. Utilizing global topic-word correlation and introducing the APU scheme allows for more adaptive and nuanced modeling, resulting in improved topic coherence and diversity.
10:50	Haiyan Yu, Jianfeng Ren, Linlin Shen and Ruibin Bai Optimal Low-rank QR Decomposition with an Application on RP-TSOD ABSTRACT. Low-rank matrix approximation has many applications, e.g., denoising, recommender systems and image reconstruction. Recently, a Randomized Pivoted Two-Sided Orthogonal Decomposition (RP-TSOD) was developed to exploit the randomization in approximating a high-dimensional matrix using QR decomposition. Instead of random projection, we propose to optimize the projection matrix for low-rank QR decomposition with the target of minimizing the approximation error. A method based on gradient descent is developed to derive optimal projections. The developed techniques can be used in not only RP-TSOD, but also other decompositions. Experimental results on both synthetic data and real data show that the proposed method could more accurately approximate a high-dimensional matrix than RP-TSOD.
11:05	Klaudia Balcer and Piotr Lipinski Extending DenseHMM with Continuous Emission ABSTRACT. Traditional Hidden Markov Models (HMM) allow us to discover the latent structure of the observed data (both discrete and continuous). Recently proposed DenseHMM provides hidden states embedding and uses the co-occurrence-based learning schema. However, it is limited to discrete emissions, which does not meet many real-world problems. We address this shortcoming by discretizing observations and using a region-based co-occurrence matrix in the training procedure. It allows embedding hidden states for continuous emission problems and reducing the training time for large sequences. An application of the proposed approach concerns recommender systems, where we try to explain how the current interest of a given user in a given group of products (current state of the user) influences the saturation of the list of recommended products with the group of products. Computational experiments confirmed that the proposed approach outperformed regular HMMs in several benchmark problems. Although the emissions are estimated roughly, we can accurately infer the states.
11:20	Yuling Yang, Jiali Feng, Baoke Li, Fangfang Yuan, Cong Cao and Yanbing Liu EDDVPL: A Web Attribute Extraction Method with Prompt Learning ABSTRACT. Since labeling web pages requires a lot of human resources and time, web attribute extraction methods based on few-shot learning have gained the at-tention of researchers. However, these methods still rely heavily on suffi-cient labeled data of several seed websites. In order to effectively alleviate the lack of domain information, we design a web attribute extraction model based on dual-view prompt learning named EDDVPL, achieving page-level few-shot learning which uses only a small number of labeled web pages for training. Specifically, we first retrieve semantic prompt information of DOM tree view by a simplified algorithm to stimulate domain-related knowledge of the pre-trained language model. Then, we introduce task prompt information of template view by constructing a template indicating the extraction target, which can help the pre-trained language model quickly understand the task of web attribute extraction. Finally, we integrate the dual-view prompt in-formation by template filling to jointly guide the training of the pre-trained language model at semantic and task levels. Extensive experimental results on the public SWDE dataset show that EDDVPL performs the best results compared to the baselines.
11:35	Zhuonan Liang, Dongnan Liu, Yuqing Yang, Caiyun Sun, Weidong Cai and Peng Fu ASRCD: Adaptive Serial Relation-based Model for Cognitive Diagnosis ABSTRACT. Cognitive diagnosis (CD) is a fundamental task in the education field. The goal of CD is recognizing the actual concept proficiency of learners. Recent studies prove that concept relation (e.g., concept Addition and concept Multiplication in mathematics) plays a key role in CD. Advanced research has made great contributions to concept relation modeling. There is a gap in automatic building and adaptive integration of relation modeling. To address these problems, we propose an Adaptive Serial Relation based model for Cognitive Diagnosis (ASRCD). We first construct a Concept Serial Relation Graph (CSRG) to automatically mine the concept relation from the learner response sequence. Then a refined graph attention network (GAT) is designed to weigh the concept relation for aggregation. Finally, we build a general CD blending concept relation. Leverage the extensibility of CSRG, it can be applied on most existing CD methods. We implement our model on two real-world datasets from education practice. Experimental results demonstrate that the proposed model performs outstandingly in both accuracy and extendibility.

Chairs:

Yi Cai and Zhen Liu

Location: Fuzhou Ting (福州厅)

10:20	Shuxi Zhang, Jianxia Chen, Meihan Yao, Xinyun Wu, Yvfan Ge and Shu Li Interactive Selection Recommendation Based on the Multi-Head Attention Graph Neural Network ABSTRACT. The click-through rate prediction of users is a critical task in the recommendation system, as a powerful machine learning method, graph neural networks have been favored by scholars to solve it in recent years. However, most click-through rate prediction models based on graph neural networks are generally model the relationship between features without considering the effectiveness of feature interaction, although not all feature combinations are meaningful. Therefore, this paper proposes a Multi-head attention Graph Neural Network with Interactive Selection, named MGNN\_IS in short, to capture the complex feature interactions via graph structures. In particular, there are three sub-graphs to be constructed to capture internal information of users and items respectively, and interactive information between users and items, namely the user internal graph, item internal graph, and user-item graph correspondingly. Moreover, the proposed model designs a multi-head attention propagation aggregation module with an interactive selection strategy, which can select the constructed graph and increase diversity with multiple heads to achieve the high-order interaction from the multiple layers. Finally, the proposed model to fuse the features to result in the final prediction. Experiments on three public datasets demonstrates that the proposed model outperformed other advanced models.
10:35	Ranhui Yan and Jia Cai An End-to-End Dense Connected Heterogeneous Graph Convolutional Neural Network ABSTRACT. Graph convolutional networks (GCNs) are powerful models for graph-structured data learning task. However, most existing GCNs may confront with two major challenges when dealing with heterogeneous graph: (1) Predefined meta-paths are required to capture the semantic relations between nodes from different types, which may not exploit all the useful information in the graph; (2) Performance degradation and semantic confusion may happen with the growth of the network depth, which limits their ability to capture long-range dependencies. To meet these challenges, we propose Dense-HGCN, an end-to-end dense connected heterogeneous convolutional neural network to learn node representation. Dense-HGCN computes the attention weights between different nodes and incorporates the information of previous layers into each layer’s aggregation process via a specific fuse function. Moreover, Dense-HGCN leverages multi-scale information for node classification or other downstream tasks. Experimental results on real-world datasets demonstrate the superior performance of Dense-HGCN in enhancing the representational power compared with several state-of-the-art methods.
10:50	Yibo Gao, Zhen Liu, Xinxin Yang, Yilin Ding, Sibo Lu and Ying Ma Graph Convolutional Network based Feature Constraints Learning for Cross-Domain Adaptive Recommendation ABSTRACT. The problem of data sparsity is a key challenge for recommendation systems. It motivates the research of cross-domain recommendation (CDR), which aims to use more user-item interaction information from source domains to improve the recommendation performance in the target domain. However, finding useful features to transfer is a challenge for CDR. Avoiding negative transfer while achieving domain adaptation further adds to this challenge. Based on the superiority of graph structural feature learning, this paper proposes a graph convolutional network based Cross-Domain Adaptive Recommendation model using Feature Constraints Learning(CDAR-FCL). To begin with, we construct a multi-graph network consisting of single-domain graphs and one cross-domain graph based on overlapping users. Next, we employ specific and common graph convolution on the graphs to learn domain-specific and domain-invariant features, respectively. Additionally, we design feature constraints on the features obtained in different graphs and mine the potential correlation for domain adaptation. To address the issue of shared parameter conflicts within the constraints, we develop a binary mask learning approach based on contrastive learning. CDAR-FCL is a domain adaptive recommendation model that can find useful features to transfer. Experiments on three pairs of real cross-domain datasets demonstrate the effectiveness of CDAR-FCL.
11:05	Hangyuan Du, Yuan Liu, Wenjian Wang and Liang Bai Motif-SocialRec: A Multi-channel Interactive Semantic Extraction Model for Social Recommendation ABSTRACT. Social recommendation is emerging as a prominent research topic in recommendation systems, which enhances prediction performance by incorporating user-item interaction information with social relationship between users. Most existing social recommendation models always model interaction information relying on pairwise relationships within neighborhood structure during the representation learning, without well capturing and utilizing high-level connection modes in the information network. Therefore, these models always fail to capture complex interaction semantics beyond pairwise relationships, which are crucial to produce valid recommendation results. To address this issue, this paper discusses social recommendation from the perspective of motif and proposes a novel recommendation model, namely Motif-SocialRec, which efficiently models interaction pattern from multi-channel with different motifs. In this model, we extract a series of local structures, depicted by motif, that can describe the high-level interactive semantics in the fused network from three views. By employing hypergraph convolution network conditioned on motif, representations that preserve potential semantic patterns can be learned under the constraint of social relationships. Additionally, we enhance the learned representations by establishing self-supervised learning tasks on different scales to further explore the inherent characteristics of the network. To produce the final recommendation prediction, a joint optimization model is constructed by integrating the primary and auxiliary tasks. Results of extensive experiments on four real-world datasets show that Motif-SocialRec signiﬁcantly outperforms baselines in terms of three evaluation metrics, on both common and cold-start settings. Finally, further insight about the explainability of Motif-SocialRec is explored by analysing recommendation predictions produced for several randomly sampled users.
11:20	Yaoming Deng, Qiqi Ding, Xin Wu and Yi Cai Curiosity Enhanced Bayesian Personalized Ranking for Recommender Systems ABSTRACT. Curiosity affects the users' selections of items, motivating them to explore the items regardless of their preferences. This phenomenon is particularly common in social networks. However, the existing social-based recommendation methods neglect the users' curiosity in the social networks, and it may cause the accuracy decrease in the recommendation. Moreover, only focusing on simulating the users' preferences can lead to users' information cocoons. To tackle the problems above, we propose a Curiosity Enhanced Bayesian Personalized Ranking (CBPR) model for the recommender systems. Our proposed model makes full use of the theories of psychology to model the users' curiosity aroused when facing different opinions. The experimental results on two public datasets demonstrate the advantages of our CBPR model over the existing models.
11:35	Qing Liu, Qian Gao and Jun Fan EWMIGCN: Emotional Weighting based Multimodal Interaction Graph Convolutional Networks for Personalized Prediction ABSTRACT. To address the challenges of information overload and cold start in personalized prediction systems, researchers have proposed graph neural network-based recommendation methods. However, existing studies have largely overlooked the shared characteristics among different modal features. Moreover, there is a mismatch between the focuses of multimodal feature extraction(MFE) and user preference modeling(UPM). To tackle these issues, this paper establishes an interaction graph by extracting multimodal information and addresses the mismatch between MFE and UPM by constructing an emotion-weighted bisymmetric linear graph convolutional network (EW-BGCN). Specifically, this paper introduces a novel model called EWMIGCN, which combines multimodal information extraction using parallel CNNs to build an interaction graph, propagates the information on EW-BGCN, and predicts user preferences by summing the expressions of users and items through inner product calculations. Notably, this paper incorporates sentiment information from user comments to finely weigh the neighborhood aggregation in EW-BGCN, enhancing the overall quality of items. Experimental results demonstrate that the proposed model achieves superior performance compared to other baseline models on three datasets, as measured by HitsRatio with Normalized Discounted Cumulative Gain.

10:20-11:50 Session ThuF2: Control and Decision Theory

Chairs:

Jiaqi Liang and Qingshan Liu

10:20-11:50 Session ThuG2: Image Processing & Computer Vision 3

10:20	Haotong Zhang and Gang Xian ASTPSI: Allocating Spare Time and Planning Speed Interval for Intelligent Train Control of Sparse Reward ABSTRACT. When using deep reinforcement learning (DRL) to solve train operation control in urban railways, encounter complex and dynamic environments with sparse rewards. Therefore, it is crucial to alleviate the negative impact of sparse rewards on finding the optimal trajectory. This paper introduces a novel algorithm called Allocating Spare Time and Planning Speed Intervals (ASTPSI), which can reduce the blindness of exploration dramatically of intelligent train agents under sparse rewards when using DRL and significantly improve their learning efficiency and operation quality. The ASTPSI can generate real-time train trajectories that meet the requirements by combining different DRL algorithms. To evaluate the algorithm's performance, we verified the convergence rate of the ASTPSI-DRL to optimize train trajectories in the face of sparse rewards on a real track. ASTPSI-DRL has better performance and stability than genetic algorithms and original DRL algorithms in reducing train energy consumption, punctuality, and accurate stopping.
10:35	Han-Yu Wu and Qingshan Liu Outer Synchronization for Multi-Derivative Coupled Complex Networks with and without External Disturbance ABSTRACT. This paper investigates the outer synchronization of multi-derivative coupled complex networks (MDCCNs), and further studies the outer $H_{\infty}$ synchronization between two MDCCNs with external disturbance. For the outer synchronization, a synchronization criterion is proposed by using adaptive control strategy, which is proved based on Lyapunov functional and the Barbalat's lemma. For the outer $H_\infty$ synchronization, an adaptive state controller and parameter updating scheme are devised for MDCCNs with external disturbance. Finally, the validity of the presented criteria is demonstrated by providing two simulation examples.
10:50	Haorui Li, Jiaqi Liang, Linjing Li and Daniel Zeng Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning ABSTRACT. Hierarchical reinforcement learning (HRL) composites subpolicies in different hierarchies to accomplish complex tasks. Automated subpolicies discovery, which does not depend on domain knowledge, is a promising approach to generating subpolicies. However, the degradation problem is a challenge that existing methods can hardly deal with due to the lack of consideration of diversity or the employment of weak regularizers. In this paper, we propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which enlarges the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further. Experimental results demonstrate that the improvement in the diversity of the generated subpolicies is significant when equipped with our WDER, which validated its effectiveness.
11:05	Chao Xu, Liting Dai and Kang Liu Bloomfilter-based Practical Kernelization Algorithm for Minimum Satisfiability ABSTRACT. Minimum Satisfiability problem (briefly, given a CNF formula, find an assignment satisfying the minimum number of clauses) has raised much attention recently. In the theoretical point of view, Minimum Satisfiability problem is fixed-parameterized, by transforming into Vertex Cover. However, such kind of transformation would be time-consuming, which takes $O(m^2\cdot n)$ times to transform into Vertex Cover. We first present a $O(m^2)$ algorithm to transform MinSAT into Vertex cover, by utilizing Bloom Filter structure. And then, instead of transformationt to Vertex Cover, we present a practical kernelization rule directly on the original formula which take time of $O(L\cdot d(F))$, with a kernel size of $k^2+k$.
11:20	Patigül Abliz and Shi Ying Bias Reduced Methods to Q-learning ABSTRACT. It is well known that Q-learning (QL) suffers from overestimation bias, which is caused by using the maximum action value to approximate the maximum expected action value. To solve overestimation issue, overestimation property of Q-learning is well studied theoretically and practically. In general, most work on reducing overestimation bias is to find different estimators to replace maximum estimator in order to mitigate the effect of overestimation bias. These works have achieved some improvement on Q-learning. In this work, we still focus on overestimation bias reduced methods. In these methods, we focus on M samples action values, and one of these samples estimated by remaining samples maximum actions. We select median and max members from these new samples which are estimated by maximum actions. We call these max and median members as Bias Reduced Max Q-learning (BRMQL) and Bias Reduced Median Q-learning (BRMeQL). We first theoretically prove that BRMQL and BRMeQL suffer from underestimation bias and analyze the effect of number of M Q-functions on the performance of our algorithms. Then we evaluate the BRMQL and BRMeQL on benchmark game environments. At last, we show that BRMQL, and BRMeQL less underestimate the Q-value than Double Q-learning (DQL) and perform better than several other algorithms on some benchmark game environments.
11:35	Yining Lv, Nianwen Ning, Hengji Li, Li Wang, Yanyu Zhang and Yi Zhou Interactive Attention-Based Graph Transformer for Multi-Intersection Traffic Signal Control ABSTRACT. With the exponential growth in motor vehicle numbers, the issue of urban traffic congestion is becoming increasingly severe. Traffic signal control has become the pivotal technology to alleviate the congestion problem. In modeling multi-intersection, most existing studies focus on communication with regional intersections. They rarely consider the correlation between cross-regional. To address the above limitation, we construct an interactive attention-based graph transformer network for traffic signal control (GTLight). Specifically, the model learns multiple dependency patterns using a relationship-enhanced interactive attention mechanism. It considers the correlations between cross-regional intersections. In addition, the model designs a phase-timing optimization algorithm to solve the problem of overestimation of Q-value in signal timing strategies. The model can provide an optimal signal phasing for each intersection based on different traffic states. We validate the effectiveness of GTLight using the CityFlow traffic simulator on synthetic and real-world traffic datasets. Compared with the recent method, the average travel time is improved by 28.16%, 26.56%, 25.79%, 26.46%, and 19.59%, respectively, achieving excellent performance.

Chairs:

Le Huang and Xiao-Hu Zhou

14:00-18:00 Session Poster Session 6: Poster Session

10:20	Zhe Yu, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng and Zeng-Guang Hou Feature-Fusion-Based Haze Recognition in Endoscopic Images ABSTRACT. Haze generated during endoscopic surgeries significantly obstructs the surgeon's field of view, leading to inaccurate clinical judgments and elevated surgical risks. Identifying whether endoscopic images contain haze is essential for dehazing. However, existing haze image classification approaches usually concentrate on natural images, showing inferior performance when applied to endoscopic images. To address this issue, an effective haze recognition method specifically designed for endoscopic images is proposed. This paper innovatively employs three kinds of features (i.e., color, edge, and dark channel), which are selected based on the unique characteristics of endoscopic haze images. These features are then fused and inputted into a Support Vector Machine (SVM) classifier. Evaluated on clinical endoscopic images, our method demonstrates superior performance: (Accuracy: 98.67%, Precision: 98.03%, and Recall: 99.33%), outperforming existing methods. The proposed method is expected to enhance the performance of future dehazing algorithms in endoscopic images, potentially improving surgical accuracy and reducing surgical risks.
10:35	Le Huang, Fen Li, Dongxiao Li and Ming Zhang MemFlowNet: A Network for Detecting Subtle Surface Anomalies with Memory Bank and Normalizing Flow ABSTRACT. Detection of subtle surface anomalies in the presence of strong noise is a challenging vision task. This paper presents a new neural network called MemFlowNet for detecting subtle surface anomalies by combining the advantages of memory bank and normalizing flow. The proposed method consists of two stages. The first stage achieves pixel-level segmentation of anomalies using noise-insensitive average features in the memory bank and Nearest Neighbor search strategy, and the second stage achieves image-level detection using normalizing flows and multi-scale score fusion. A new dataset called INSCup has been developed to assist this research by acquiring inner surface images of stainless steel insulated cups with ultra-wide lens. The performance of MemFlowNet has been validated on INSCup dataset by surpassing other mainstream methods. In addition, MemFlowNet achieves the best performance with an image level AUROC of 99.57% in anomaly detection of MVTec-AD benchmark. It shows a great potential to apply MemFlowNet to automated visual inspection of surface anomalies.
10:50	Gaowen Liu, Huan Liu, Caixia Yan, Yuyang Guo, Rui Li and Sizhe Dang Spatially-Aware Human-Object Interaction Detection with Cross-Modal Enhancement ABSTRACT. We propose a novel two-stage HOI detection model that incorporates cross-modal spatial information awareness. Human-object relative spatial relationships are highly relevant for specific HOI species, but current approaches fail to model such crucial cues explicitly. We observed that relative spatial relationships possess properties that can be described in natural language easily and intuitively. Building on this observation and inspired by recent advancements in prompt-tuning, we design a Prompt-Enhanced Spatial Modeling (PESM) module that generates linguistic descriptions of spatial relations between humans and objects. PESM is capable of merging the explicit spatial information obtained by the aforementioned text descriptions with the implicit spatial information of the visual modality. Moreover, we devise a two-stage model architecture that effectively incorporates auxiliary cues to exploit the enhanced cross-modal spatial information. Extensive experiments conducted on the HICO-DET benchmark demonstrate that the proposed model outperforms state-of-the-art methods, indicating its effectiveness and superiority. The source code is available at https://github.com/liugaowen043/tsce
11:05	Jade Eva Guisiano, Éric Moulines, Thomas Lauvaux and Jérémie Sublime Oil and Gas Automatic Infrastructure Mapping: Leveraging High-Resolution Satellite Imagery through fine-tuning of object detection models ABSTRACT. The oil and gas sector is the second largest anthropogenic emitter of methane, which is responsible for at least 25% of current global warming. To curb methane’s contribution to climate change, emissions from oil and gas infrastructure must be monitored. Initiatives such as the Methane Alert and Response System (MARS) launched by the United Nations Environment Program aim to pinpoint significant emission events, alert relevant stakeholders, and monitor and track progress in mitigation efforts. However, an automated solution is needed for consistent monitoring across multiple oil and gas basins. In this extended study, we focus on automated identification of oil and gas infrastructure using advanced supervised object detection algorithms such as YOLO, FASTER RCNN, and DETR with fine-tuning on a specifically segmented Oil and Gas infrastructure database (930 images, 1951 objects). We are specifically investigating automatic detection in the Permian Basin of the U.S. using these algorithms refined with our customized high-resolution image database. The tests performed demonstrate the effectiveness of YOLO v8 model both pre-trained and non-pre-trained.
11:20	Xizhe Wang, Jing Guo, Peng Zhang, Qilei Chen, Zhang Zhang, Yu Cao, Xinwen Fu and Benyuan Liu A Deep Learning Framework with Pruning RoI Proposal for Dental Caries Detection in Panoramic X-ray Images ABSTRACT. Dental caries is a prevalent noncommunicable disease that affects over half of the global population. It can significantly diminish individuals' quality of life by impairing their eating and socializing abilities. Consistent dental check-ups and professional oral healthcare are crucial in preventing dental caries and other oral diseases. Deep learning based object detection provides an efficient approach to assist dentists in identifying and treating dental caries. In this paper, we present a deep learning framework with a lightweight pruning region of interest (P-RoI) proposal specifically designed for detecting dental caries in panoramic dental radiographic images. Moreover, this framework can be enhanced with an auxiliary head for label assignment during the training process. By utilizing the Cascade Mask R-CNN model with a ResNet-101 backbone as the baseline, our modified framework with the P-RoI proposal and auxiliary head achieves a notable 3.85 increase in Average Precision (AP) for the dental caries class within our dental dataset.
11:35	Nonthapaht Taspan, Bukorree Madthing, Panumate Chetprayoon, Thanatwit Angsarawanee, Kitsuchart Pasupa and Theerat Sakdejayont Generating Pseudo-Labels for Car Damage Segmentation using Deep Spectral Method ABSTRACT. Car damage segmentation, an integral part of vehicle damage assessment, involves identifying and classifying various types of damages from images of vehicles, thereby enhancing the efficiency and accuracy of assessment processes. This paper introduces an efficient approach for car damage assessment by combining pseudo-labeling and deep learning techniques. The method addresses the challenge of limited labeled data in car damage segmentation by leveraging unlabeled data. Pseudo-labels are generated using a deep spectral approach and refined through merge and flip-bit operations. Two models, i.e., Mask R-CNN and SegFormer, are trained using a combination of ground truth labels and pseudo-labels. Experimental evaluation of the CarDD dataset demonstrates the superior accuracy of our method, achieving improvements of 12.9% in instance segmentation and 18.8% in semantic segmentation when utilizing a 1/2 ground truth ratio. In addition to enhanced accuracy, our approach offers several benefits, including time savings, cost reductions, and the elimination of biases associated with human judgment. By enabling more precise and reliable identification of car damages, our method enhances the overall effectiveness of the assessment process. The integration of pseudo-labeling and deep learning techniques in car damage assessment holds significant potential for improving efficiency and accuracy in real-world scenarios.

Chairs:

Lin Xiao and Bingchuan Wang

Location: Poster Area (Third Floor)

Xiaohong Li, Qixuan Peng, Ruihong Li and Xingjun Guo

Detect Overlapping Community via Graph Neural Network and Topological Potential

ABSTRACT. Overlapping community structure is an important characteristic of real complex networks, the goal of the overlapping community detection is to resolve the modular with the information contained in the networks. However, most existing methods based on deep learning techniques directly utilize the original network topology or node attributes, ignoring the importance of various edge information. Inspired by the effective representation learning capability of graph neural network and the ability of topological potential to measure the intimacy between nodes, we propose a novel model, named DOCGT, for overlapping community detection. This model deconstructs the original graph into a first-order graph and a second-order graph, and builds a set of graph neural network modules based on the Bernoulli-Poisson (BP) model, and then uses its advantages to independently learn the node embedding representation of different orders. To this end, we introduce the concept of topological potential matrix. It can not only effectively merge the above embeddings, but also integrate abundant edge information into the entire model. This fused embedding matrix can help us get the final community structure. Experimental results on real datasets show that our method can effectively detect overlapping community structures.

Wenjie Huang, Xing Wu, Chengliang Wang, Zailin Yang, Longrong Ran and Yao Liu

Adaptive Focal Inverse Distance Transform Maps for Cell Recognition

ABSTRACT. The quantitative analysis of cells is crucial for clinical diagnosis, and effective analysis requires accurate detection and classification. Using point annotations for weakly supervised learning is a common approach for cell recognition, which significantly reduces the labeling workload. Cell recognition methods based on point annotations primarily rely on manually crafted smooth pseudo labels. However, the diversity of cell shapes can render the fixed encodings ineffective. In this paper, we propose a multi-task cell recognition framework. The framework utilizes a regression task to adaptively generate smooth pseudo labels with cell morphological features to guide the robust learning of probability branch and utilizes an additional branch for classification. Meanwhile, in order to address the issue of multiple high-response points in one cell, we introduce Non-Maximum Suppression (NMS) to avoid duplicate detection. On a bone marrow cell recognition dataset, our method is compared with five representative methods. Compared with the best performing method, our method achieves improvements of 2.0 F1 score and 3.6 F1 score in detection and classification, respectively.

Alexander Biddulph, Trent Houliston, Alexandre Mendes and Stephan Chalup

Stereo Visual Mesh for Generating Sparse Semantic Maps at High Frame Rates

ABSTRACT. The Visual Mesh is an input transform for deep learning that allows depth independent object detection at extremely high frame rates. The present study introduces a Visual Mesh based stereo vision method for sparse stereo semantic segmentation. A new dataset of simulated 3D scenes was generated and used for training to show that the method is ca- pable of processing high resolution stereo inputs to generate both left and right sparse semantic maps. The new stereo method demonstrated supe- rior classification accuracy when compared to the corresponding monoc- ular approach. The very high frame rates and high accuracy may make the proposed approach attractive to fast-paced on-board robot or IoT applications.

Shiqi Wang, Fei Long and Junfeng Yao

Micro-Expression Recognition Based on PCB-PCANet+

ABSTRACT. Micro-expressions (MEs) have the characteristics of small motion amplitude and short duration. How to learn discriminative ME features is a key issue in ME recognition. Motivated by the success of PCB model in person retrieval, this paper proposes a ME recognition method called PCB-PCANet+. Considering that the important information of MEs is mainly concentrated in a few key facial areas like eyebrows and eyes, based on the output of shallow PCANet+, we use a multiple branch LSTM networks to separately learn the local spatiotemporal features for each facial ROI region. In addition, in the stage of multiple branch fusion, we design a feature weighting strategy according to the significances of different facial regions to further improve the performances of ME recognition. The experimental results on the SMIC and CASME II datasets validate the effectiveness of the proposed method.

Haotian Liu, Siqi Wang, Chang Meng, Hengyu Zhang, Xiao Xianjing and Xiu Li

Unsupervised Fabric Defect Detection Framework based on Knowledge Distillation

ABSTRACT. Fabric defect detection is a critical task in the textile industry. Efficient and accurate automated detection schemes, such as computer vision fabric quality inspection, are urgently needed. However, traditional feature-based methods are often limited and difficult to implement universal solutions in industrial scenarios due to their specificity towards certain defect types or textures. Meanwhile, machine learning methods may face difficulties in harsh industrial production environments due to insufficient data and labels. To address these issues, we propose an unsupervised defect detection framework based on knowledge distillation, which includes a visual localization module to assist with the detection task. Our approach significantly improves classification and segmentation accuracy compared to previous unsupervised methods. Besides, we perform a comprehensive set of ablation experiments to determine the optimal values of different parameters. Furthermore, our method demonstrates promising performance in both open databases and real industrial scenarios, highlighting its high practical value.

Jiaqiang Huang, Junjian Huang, Jinyue Yang and Yao Zhong

Global Exponential Synchronization of Quaternion-Valued Neural Networks via Quantized Control

ABSTRACT. In this paper, quantization controllers are designed to implement global exponential synchronization of quaternionvalued neural networks. Firstly, based on Hamilton’s principle, the quaternion-valued neural networks are decomposed into four equivalent real-valued neural networks. Then, utilizing the principles of Lyapunov stability and matrix inequality theory, the drive-response synchronization method is utilized to obtain the result on exponential synchronization of quaternion-valued neural networks. Finally, the effectiveness of the proposed method is verified by numerical simulation examples.

Zhikai Lei, Jie Zhou, Qin Chen, Qi Zhang and Liang He

Informative Prompt Learning for Low-shot Commonsense Question Answering via Fine-Grained Redundancy Reduction

ABSTRACT. Low-shot commonsense question answering (CQA) poses a big challenge due to the absence of sufficient labeled data and commonsense knowledge. Recent work focuses on utilizing the potential of commonsense reasoning of pre-trained language models (PLMs) for low-shot CQA. In addition, various prompt learning methods have been studied to elicit implicit knowledge from PLMs for performance promotion. Whereas, it has been shown that PLMs suffer from the redundancy problem that many neurons encode similar information, especially under a small sample regime, leading prompt learning to be less informative in low-shot scenarios. In this paper, we propose an informative prompt learning approach, which aims to elicit more diverse and useful knowledge from PLMs for low-shot CQA via fine-grained redundancy reduction. Specifically, our redundancy-reduction method imposes restrictions upon the fine-grained neuron-level to encourage each dimension to model different knowledge or clues. Experiments on three benchmark datasets show the great advantages of our proposed approach in low-shot settings. Moreover, we conduct both quantitative and qualitative analyses, which sheds light on why our approach can lead to great improvements.

Jiaying Chen, Qiyu Sun, Chaoqiang Zhao, Wenqi Ren and Yang Tang

Rethinking unsupervised domain adaptation for nighttime tracking

ABSTRACT. Despite the considerable progress that has been achieved in visual object tracking, it remains a challenge to track in low-light circumstances. Prior nighttime tracking methods suffer from either weak collaboration of cascade structures or the lack of pseudo supervision, and thus fail to bring out satisfactory results. In this paper, we develop a novel unsupervised domain adaptation framework for nighttime tracking. Specifically, we benefit from the establishment of pseudo supervision in the mean teacher network, and further extend it with three components at the input level and the optimization level. For the unlabeled target domain dataset, we first introduce an assignment-based object discovery strategy to generate suitable training patches. Meanwhile, a low-light enhancer is embedded to improve the pseudo labels that facilitate the following consistency learning. Then, with the aid of better training data and pseudo labels, we replace the common mean square error with two stricter losses, which are entropy-decreasing classification consistency loss and confidence-weighted regression consistency loss, for better convergence. Experiments demonstrate that our proposed method achieves significant performance gains on multiple nighttime tracking benchmarks, and even brings slight enhancement on the source domain.

Mikołaj Słupiński and Piotr Lipiński

Improving SLDS performance using explixit duration variables with infinite support

ABSTRACT. We improve the segmentation in Switching Linear Dynamical Systems. We extend the Beam Sampling algorithm to perform the efficient inference allowing for a duration distribution with infinite support. We conduct experiments on three benchmarks (two already prevalent in the state-space model literature and one demonstrating behavior in a sparse setting) that test the correctness and efficiency of our solution.

Mingle Zhou, Zhanzhi Su, Min Li, Delong Han and Gang Li

Exploring Adaptive Regression Loss and Feature Focusing in Industrial Scenarios

ABSTRACT. Industrial defect detection is designed to detect quality defects in industrial products. However, the surface defects of different industrial products vary greatly—for example, the variety of texture shapes and the complexity of background information. A lightweight Focus Encoder-Decoder Network (FEDNet) is presented to solve these problems. Specifically, the novelty of FEDNet is as follows: First, the feature focusing module (FFM) is designed to focus the attention on defect features in complex backgrounds. Secondly, a lightweight texture extraction module (LTEM) is proposed to lightly extract the texture and relative location information of shallow network defect features. Finally, the AZIoU, an adaptive adjustment loss function, is reexamined in the prediction box's specific circumference and length-width bits. Experiments on two industrial defect datasets show that FEDNet achieves the accuracy of Steel at 42.86% and DeepPCB at 72.19% using only 15.3 GFLOPs.

Xizhe Li, Chenhao Hu, Weiyang Kong, Sen Zhang and Yubao Liu

Modeling User's Neutral Feedback in Conversational Recommendation

ABSTRACT. Conversational recommendation systems (CRS) enable the traditional recommender systems to obtain fine-grained dynamic user preferences by incorporating interactive conversations. Although CRS has shown success in generating recommendation lists based on user’s preferences, existing methods restrict users to make binary responses, i.e., accept and reject, after recommending, which greatly limits users from expressing their needs. In fact, the user's rejection feedback may potentially contain other valuable information. To address this limitation, we try to refine user's negative item-level feedback into attribute-level and extend the CRS to a more realistic scenario that not only incorporates positive and negative feedback, but also neutral feedback. Neutral feedback denotes incomplete satisfaction with recommended items, which can guide the system to infer user’s preferences and optimize the recommendation. To better cope with the new setting, we propose a conversation recommendation model called Neutral Feedback in Conversational Recommendation (NFCR). We adopt a joint learning task framework for feature extraction and use predefined rules from inverse reinforcement learning to train the decision network, which enables us to make appropriate decisions at each turn. Finally, we utilize the fine-grained neutral feedback from users to acquire their dynamic preferences in the update and deduction module. We conducted comprehensive evaluations on four benchmark datasets to demonstrate the effectiveness of our model.

Yang Gao, Shasha Li, Pancheng Wang and Ting Wang

Jointly extractive and abstractive training paradigm for text summarization

ABSTRACT. Text summarization is a classical task in natural language generation, which aims to generate concise summary of the original ar- ticle. Neural networks based on the Encoder-Decoder architecture have made great progress in recent years in generating abstractive summaries with high fluency. However, due to the randomness of the abstractive model during generation, the summaries risk missing important infor- mation in the articles. To address this challenge, this paper proposes a jointly trained text summarization model that combines abstractive and extractive summarization. On the one hand, extractive models have higher ROUGE scores but poorer readability on the other hand, ab- stractive models can produce a more fluent summary but suffer from the problem of omitting important information in the original text. There- fore, We share the encoder of both models and jointly train both models to obtain a text representation that benefits from regularisation. We also add document level information obtained from an extractive model to the decoder of the abstractive model to improve abstractive summary. Experiments on CNN/Daily Mail dataset , Pubmed dataset and Arxiv dataset demonstrate the effectiveness of the proposed model.

Zhen Zhang, Wenhao Yun, Xiyuan Jia, Qiyun Lv, Hao Ni, Xin Wang and Guohua Wu

FHSI-GNN: Fusion Hierarchical Structure Information Graph Neural Network for Extractive Long Documents Summarization

ABSTRACT. Extractive text summarization aims to select salient sentences from documents. However, most existing extractive methods struggle to capture inter-sentence relations in long documents. In addition, the hierarchical structure information of the document is ignored. For example, some scientific documents have fixed chapters, and sentences in the same chapter have the same theme. To solve these problems, this paper proposes a Fusion Hierarchical Structure Information Graph Neural Network for Extractive Long Documents Summarization. The model constructs a section node containing sentence nodes and global information according to the document structure. It integrates the hierarchical structure information of the text and uses position information to identify sentences. The section node acts as an intermediary node for information interaction between sentences, which better enriches the relationships between sentences and has higher computational efficiency. Our model has achieved excellent results on two datasets, PubMed and arXiv. Further analysis shows that the hierarchical structure information of documents helps the model select salient content better.

Feng Huang, Qiang Huang, Yuetong Zhao, Zhixiao Qi, Bingkun Wang, Yongfeng Huang and Songbin Li

A Three-Stage Framework For Event-Event Relation Extraction with Large Language Model

ABSTRACT. Expanding the parameter count of a large language model (LLM) alone is insufficient to achieve satisfactory outcomes in natural language processing tasks, specifically event extraction (EE), event temporal relation extraction (ETRE), and event causal relation extraction (ECRE). To tackle these challenges, we propose a novel three-stage extraction framework(ThreeEERE) that integrates an improved automatic chain of thought prompting (Auto-CoT) with LLM and is tailored based on a golden rule to maximize event and relation extraction precision. The three stages include constructing examples in each category, federating local knowledge to extract relationships between events, and selecting the best answer. By following these stages, we can achieve our objective. Although supervised models dominate for these tasks, our experiments on three types of extraction tasks demonstrate that utilizing these three stages approach yields significant results in event extraction and event relation extraction, even surpassing some supervised model methods in the extraction task.

Qin Wang, Jianzhou Feng, Ganlin Xu and Lei Huang

Two-Phase Semantic Retrieval For Explainable Multi-Hop Question Answering

ABSTRACT. Explainable Multi-Hop Question Answering (MHQA) requires an ability to reason explicitly across facts to arrive at the answer. The majority of multi-hop reasoning methods concentrate on semantic similarity to obtain the next hops or act as entity-centric inference. However, approaches that ignore the rationales required for problems can easily lead to blindness in reasoning. In this paper, we propose a two-Phase text Retrieval method with an entity Mask mechanism (PRM), which focuses on the rationale from global semantics along with entity consideration. Specifically, it consists of two components: 1) The rationaleaware retriever is pre-trained via a dual encoder framework with an entity mask mechanism. The learned representations of hypotheses and facts are utilized to obtain top K candidate core facts by a sentence-level dense retrieval. 2) The entity-aware validator determines the reachability of hypotheses and core facts with an entity granularity sparse matrix. Our experiments on three public datasets in the scientific domain (i.e., OpenbookQA, Worldtree, and ARC-Challenge) demonstrate that the proposed model has achieved remarkable performance over the existing methods.

Yafang Zheng, Lei Lin, Yuxuan Yuan and Xiaodong Shi

Effective Guidance in Zero-Shot Multilingual Translation via Multiple Language Prototypes

ABSTRACT. In a multilingual neural machine translation model that fully shares parameters across all languages, a popular approach is to use an artificial language token to guide translation into the desired target language. However, recent studies have shown that language-specific signals in prepended language tokens are not adequate to guide the MNMT models to translate into right directions, especially on zero-shot translation (i.e., off-target translation issue). We argue that the representations of prepended language tokens are overly affected by its context information, resulting in potential information loss of language tokens and insufficient indicative ability. To address this issue, we introduce multiple language prototypes to guide translation into the desired target language. Specifically, we categorize sparse contextualized language representations into a few representative prototypes over training set, and inject their representations into each individual token to guide the models. Experiments on several multilingual datasets show that our method significantly alleviates the off-target translation issue and improves the translation quality on both zero-shot and supervised directions.

Liangjiang Lin, Zefeng Chen and Yuren Zhou

Towards Scalable Feature Selection: An Evolutionary Multitask Algorithm Assisted by Transfer Learning Based Co-surrogate

ABSTRACT. When faced with large-instance datasets, existing feature selection methods based on evolutionary algorithms still face the challenge of high computational cost. To address this issue, this paper proposes a scalable evolutionary algorithm for feature selection on largeinstance datasets, namely, transfer learning based co-surrogate assisted evolutionary multitask algorithm (cosEMT). Firstly, we tackle the feature selection on large-instance datasets via an evolutionary multitasking framework. The co-surrogate models are constructed to measure the similarity between each auxiliary task and main task, and the knowledge transfer between tasks is realized through instance-based transfer learning. Through the numerical relationship between the relative and absolute number of transferable instances, we propose a novel dynamic resource allocation strategy to make more efficient use of limited computational resources and accelerate evolutionary convergence. Meanwhile, an adaptive surrogate model update mechanism is proposed to balance the exploration and exploitation of the base optimizer embedded in the cosEMT framework. Finally, the proposed algorithm is compared with several state-of-the-art feature selection algorithms on twelve large-instance datasets. The experimental results show that the cosEMT framework can obtain significant acceleration in the convergence speed and high-quality solutions. All verify that cosEMT is a highly competitive method for feature selection on large-instance datasets.

Zhiqiang Li, Xiaogen Zhou and Tong Tong

A Two-Stage Network for Segmentation of Vertebrae and Intervertebral Discs: Integration of Efficient Local-Global Fusion Using 3D Transformer and 2D CNN.

ABSTRACT. In the field of computer-aided diagnosis (CAD) for spinal diseases,the fundamental task of multi-label segmentation for vertebrae and intervertebral discs (IVDs) assumes a significant role.However,the distinctive characteristics inherent to the spinal structure pose considerable challenges to the segmentation process,impeding its practical applicability in clinical settings.Convolutional neural networks have been widely used in this task;however,their limited receptive field restricts their capacity to capture extended-range spatial correlations.Consequently,the model's ability to accurately delineate vertebral boundaries is compromised,leading to a notable deterioration in the quality of segmentation outputs.To address this limitation,we propose a novel two-stage convolutional neural network (CNN) framework that incorporates both 3D Transformers and 2D CNNs.By synergistically leveraging the advantages of Transformers in facilitating the integration of long-range dependencies and the ability of CNNs to learn global and local features,our proposed approach exhibits promising potential in enhancing the segmentation performance for vertebrae and intervertebral discs.Moreover,we introduce a graph convolution module into our network architecture to exploit the inherent spatial dependencies present in MRI scans of spinal structures,thereby extracting semantic feature representations and further augmenting the efficacy of segmentation.The evaluation of our proposed method is conducted on the MRSpineSeg Challenge dataset,encompassing T2-weighted MR images.Experimental results affirm the superiority of our approach over representative state-of-the-art methodologies.

Siqi Ma, Zhe Liu, Yuqing Song, Yi Liu, Kai Han and Yang Jiang

A domain knowledge-based semi-supervised pancreas segmentation approach

ABSTRACT. Deep learning-based methods have obtained the remarkable achievements, however, obtaining enough labeled data is time-consuming and labor-intensive. Semi-supervised learning is an effective way to alleviate dependence on annotated data by combining unlabeled data. The existing semi-supervised segmentation works are easier to ignore the domain knowledge, leading to location and shape bias. In this paper, we propose a semi-supervised medical segmentation method based on domain knowledge. Specifically, the prior constraints for different organ sub-regions are used to guide the pseudo-label generation for unlabeled data. Then the bidirectional information flow regularization is designed by further utilizing pseudo-labels, encouraging the model to align the labeled and unlabeled data distributions. Extensive experiments on NIH pancreas datasets show: the proposed method achieved Dice of 76.23% and 80.76% under 10% and 20% labeled data, respectively, which is superior to other semi-supervised pancreas segmentation methods.

Jakub Więckowski and Wojciech Sałabun

How to support sport management with decision systems? Swimming athletes assessment study case

ABSTRACT. Information systems in sports play an increasingly important role due to the opportunities and benefits they present to sports clubs. The purpose of these systems is to assist in decision-making processes concerning marketing, the selection of training parameters, recovery methods, or the selection of team members, among others. It results in increased interest in decision support systems in this area, allowing clubs to gain an advantage over their competitors.

In this paper, we propose a research approach for creating models to evaluate the performance of swimming athletes based on their physical parameters and the sport level they represent. Due to the uncertainties involved, data presented in the form of Triangular Fuzzy Numbers (TFN) were used in the Fuzzy Technique for Order Preference by Similarity to an Ideal Solution (TOPSIS) method to obtain a ranking of the athletes. A sensitivity analysis for the exclusion of subsequent criteria was also carried out. The results obtained were compared regarding selecting the different significance of the criteria weights. An additional six Fuzzy Multi-Criteria Decision Analysis (MCDA) methods were used for a comprehensive analysis, and the results showed that the proposed averaged ranking is a reasonable solution. The proposed approach can be used to evaluate players from different sports so that sports clubs can recruit athletes with high-performance potential.

Chen Yang, Shuyan Ding, Lunbo Li and Jianhui Guo

Graph Attention Hashing via Contrastive Learning for Unsupervised Cross-modal Retrieval

ABSTRACT. Hashing-based cross-modal retrieval maps multi-modal features into binary codes into a common Hamming space. Due to its small storage consumption and high efficiency, hashing has received extensive attention in recent years. However, the current researches have difficulty in constructing a well-defined joint semantic space and conduct more detailed and in-depth learning guidance. In this paper, Graph Attention Hashing via Contrastive Learning (GAHCL) is proposed to address these issues. First, we use the idea of contrastive learning to generate positive samples, and propose a novel contrastive adjacency matrix through a graph attention network. Specifically, this matrix assigns higher weights to node pairs whose source is the same sample, and assigns lower weights to node pairs that do not match each other. The key semantic features can be captured more carefully and accurately under the influence of attention weights. In addition, the contrastive loss function is constructed by taking the output features of different modalities in an instance and its generated positive sample features as a positive sample pair. Extensive experiments on two datasets show that the proposed method can significantly outperform existing competitors.

Jun Li, Guangyu Li, Haobo Jiang, Weili Guo and Chen Gong

An Efficient Enhanced-YOLOv5 Algorithm for Multi-scale Ship Detection

ABSTRACT. Ship detection has gained considerable attentions from industry and academia. However, due to the diverse range of ship types and complex marine environments, multi-scale ship detection suffers from great challenges including low detection rate, high computation time and so on. To solve the above issues, we propose an efficient enhanced-YOLOv5 algorithm for multi-scale ship detection. Specifically, to dynamically extract two-dimensional features, we design a MetaAconC-inspired adaptive spatial-channel attention module for reducing the impact of complex marine environments on large-scale ships. In addition, we construct a gradient-refined bounding box regression module to enhance the sensitivity of loss function gradient and strengthen the feature learning ability, which can relieve the issue of uneven horizontal and vertical features in small-scale ships. Finally, a Taylor expansion-based classification module is established which increases the feedback contribution of gradient by adjusting the first polynomial coefficient vertically, and improves the detection performance of the model on few sample ship objects. Extensive experimental results confirm the effectiveness of the proposed method.

Weiwei Wei, Yuqian Zhou, Dan Li and Xina Hong

Double-Layer Blockchain-Based Decentralized Integrity Verification for Multi-Chain Cross-Chain Data

ABSTRACT. With the development of blockchain technology, there are certain bottlenecks in terms of storage, throughput, and latency. To address these issues, many multi-chain architectures have emerged to enable data interoperability among different blockchains through cross-chain techniques. However, in highly distributed cross-chain scenarios, the integrity of cross-chain data can be compromised intentionally or unintentionally. Due to the decentralized nature of blockchain, centralized verification schemes are not feasible, making decentralized cross-chain data integrity verification a critical and challenging problem.In this paper, based on the ideas of "governing the chain by chain" and "double-layer blockchain", we propose a double-layer blockchain-based decentralized integrity verification scheme to solve this problem. We construct a supervision-chain by selecting representative nodes from multiple blockchains, which is responsible for cross-chain data integrity verification and recording results.Specifically, we elaborate on the consensus process comprising integrity consensus and block consensus. The integrity consensus stage achieves decentralized data integrity verification, while the block consensus stage packages and records the results from the integrity consensus stage. Furthermore, we design reputation system, and election algorithm within the supervision-chain.Through security analysis and performance evaluation, we demonstrate the security and effectiveness of our proposed scheme.

Yixing Guo, Weimin Li, Jingchao Wang and Shaohua Li

Self-Supervised-Enhanced Dual Hierarchical Graph Convolution Network for Social Recommendation

ABSTRACT. Graph convolution networks (GCNs) have made significant progress in the field of recommendation systems in recent years, and many GCN-based frameworks have applied in social recommendation methods. The essence of social recommendation tasks is modeling user preferences through user social relationships to alleviate the sparsity issue. However, existing GCN-based social recommendation frameworks still have some inherent problems. Firstly, since there are no node attributes available as semantic information in the recommendation task, lightweight graph convolutions that remove feature transformation and non-linear activation function have become widely applied in recommendation task, with the core lies in its message passing mechanism. Social recommendation methods always directly apply existing message passing paradigms, which always have obvious limitations in their message passing mechanisms. Secondly, most existing social recommendation frameworks are limited to pairwise relations and unable to effectively extract implicit inter-graph information. To address these issues, we propose a Self-Supervised-Enhanced Dual Hierarchical Graph Convolution Network (SSHGCN). In this framework, we first propose a LEWB message passing paradigm applied to graph convolution network to train user and item representations. Then we explicitly model marginal information between user social graph and user-item interaction graph, item knowledge graph and user-item interactions graph by hypergraph. Finally, we construct hierarchical self-supervised signals and unify self-supervised task and recommendation task for joint training. Extensive experiments on real-world datasets demonstrate that our method outperforms competitive methods. Thorough ablation study verifies the rationality of LEWB message passing paradigm and the effectiveness of the hierarchical self-supervised tasks in our framework.

Ryo Nojima and Lihua Wang

Differential Private (Random) Decision Tree without Adding Noise

ABSTRACT. The decision tree is a typical algorithm in machine learning with multiple expanded variations. However, regarding privacy, few of these variations have been put into practice due to a number of issues associated with balancing between privacy preservation and performance. The decision tree is a typical algorithm in machine learning and has multiple expanded variations. However, regarding privacy, few in the variations reached practical level due to many challenges on balancing privacy preservation and performance. In this paper, we propose a method of applying privacy preservation to the (random) decision tree, which is a variation of the expanded decision tree proposed by Fan et al. in 2003, to achieve the following goals: - Model training with data belonging to multiple organizations and concealing these data among organizations. - No leakage of training data from trained models.

Zheyu Shi, Ying Mao, Lishun Wang, Hangcheng Li, Yong Zhong and Xiaolin Qin

NDGR: A Noise Divide and Guided Re-labeling Framework for Distantly Supervised Relation Extraction

ABSTRACT. Distant supervision(DS) is widely used in relation extraction to reduce the cost of annotation but suffers from noisy instances. Existing methods always select reliable instances that rely on potential noisy labels, result in the selection of many noisy instances or ignore a large number of valuable training instances. In this paper, we propose NDGR, a novel training framework for sentence-level distantly supervised relation extraction. NDGR divides the noisy data from DS-built data by modeling the loss distribution with a Gaussian Mixture Model, then assigns pseudo labels for noisy data to transform them into useful training data. To alleviate the noise in generated labels, we adopt a guided label generation strategy that uses the updated Relation Extraction Network as a reference to optimize the Label Generation Network. Through iterative execution of noise divide and guided label generation, NDGR helps refine the noisy DS-built data and enhance the performance. Extensive experiments on widely-used benchmarks have demonstrated that our method has significant improvement on sentence-level evaluation and de-noise effect.

Jing Liu, Fei Wu, Hao Jin, Xiaoke Zhu and Xiao-Yuan Jing

Inter-modal Fusion Network with Graph Structure Preserving for Fake News Detection

ABSTRACT. The continued ferment of fake news on the network threatens the stability and security of society, prompting researchers to focus on fake news detection. The development of social media has made it challenging to detect fake news by only using uni-modal information. Existing studies tend to integrate multi-modal information to pursue completeness for information mining. How to eliminate modality differences effectively while capturing structure information well from multi-modal data remains a challenging issue. To solve this problem, we propose an Inter-modal Fusion network with Graph Structure Preserving (IF-GSP) approach for fake news detection. An inter-modal cross-layer fusion module is designed to bridge the modality differences by integrating features in different layers between modalities. Intra-modal and cross-modal contrastive losses are designed to enhance the inter-modal semantic similarity while focusing on modal-specific discriminative representation learning. A graph structure preserving module is designed to make the learned features fully perceive the graph structure information based on a graph convolutional network (GCN). A multi-modal fusion module utilizes an attention mechanism to adaptively integrate cross-modal feature representations. Experiments on two widely used datasets show that IF-GSP significantly outperforms related multi-modal fake news detection methods.

Jiaxin Deng, Xu Yang and Suiwu Zheng

Learning to Match Features with Geometry-aware Pooling

ABSTRACT. Finding reliable and robust correspondences across images is a fundamental and crucial step for many computer vison tasks, such as 3D-reconstruction and virtual reality. However, previous studies still struggle in challenging cases, including large view changes, repetitive pattern and textureless regions, due to the neglect of geometric constraint in the process of feature encoding. Accordingly, we propose a novel GPMatcher, which is designed to introduce geometric constraints and guidance in the feature encoding process. To achieve this goal, we compute camera poses with the corresponding features in each attention layer and adopt a geometry-aware pooling to reduce the redundant information in the next layer. By these means, an iterative geometry-aware pooing and pose estimation pipeline is constructed, which avoids the updating of redundant feaures and reduces the impact of noise. Experiments conducted on a range of evaluation benchmarks demonstrate that our menthod improves the matching accurary and achieves the state-of-the-art performance.

Wenxiu Zhao and Changlei Dongye

Multi-Scale Information Fusion Combined with Residual Attention for Text Detection

ABSTRACT. Driven by deep learning and neural networks, text detection technology has made further developments. Due to the complexity and diversity of scene text, detecting text of arbitrary shapes has become a challenging task. Previous segmentation-based text detection methods can hardly solve the problem of missed detection in complexity scene text detection. In this paper, we propose a text detection model that combines residual attention with a multi-scale information fusion structure to effectively capture text information in natural scenes and avoid text omission. Specifically, the multi-scale information fusion structure extracts text features from different levels to achieve better text localisation and facilitate the fusion of text information. At the same time, residual attention is combined with features from high-resolution images to enhance the contextual information of the text and avoid missing text. Finally, text instances are obtained by a binarisation method. This method is very helpful for text detection in complex scenes. Experiments conducted on three public benchmark datasets show that the method achieves state-of-the-art performance.

Huy Trịnh Quang, Mai Nguyen, Quan Nguyen Van, Linh Doan Bao, Thanh Dang Hong, Thanh Nguyen Tung and Toan Pham Van

DAMFormer: Enhancing Polyp Segmentation through Dual Attention Mechanism

ABSTRACT. Abstract. Polyp segmentation has been a challenging problem for re- searchers because it does not define a specific shape, color, or size. Traditional deep learning models, based on convolutional neural net- works (CNNs), struggle to generalize well on unseen datasets. However, the Transformer architecture has shown promising potential in address- ing medical problems by effectively capturing long-range dependencies through self-attention. This paper introduces the DAMFormer model based on Transformer for high accuracy while keeping lightness. The DAMFormer utilizes a Transformer encoder to extract better global in- formation. The Transformer outputs are strategically fed into the Con- vBlock and Enhanced Dual Attention Module to effectively capture high- frequency and low-frequency information. These outputs are further pro- cessed through the Effective Feature Fusion module to combine global and local features efficiently. In our experiment, five standard benchmark datasets were used Kvasir, CVC-Clinic DB, CVC-ColonDB, CVC-T, and ETIS-Larib.

Shuqi Pan, Canlong Zhang, Zhixin Li, Liaojie Hu and Yanfang Deng

Customized Anchors Can Better Fit the Target in Siamese Tracking

ABSTRACT. Most existing siamese trackers rely on some fixed anchors to estimate the scale and aspect ratio for all targets. However, in real tracking, different targets have different sizes and shapes, these predefined anchors are not enough to cover all possible scales and aspect ratios caused by various movement and deformation, so an adaptive scale and aspect ratio estimation method is expected for robust online tracking. In this paper, a customized anchor generation module is first proposed to estimate the shape of the target and generate customized anchors adapted to the target. Then, through an anchor adaptation module, each anchor information is embed into corresponding feature to learn more discriminative features. Finally, We design a Target-aware feature correlation module to reduce the interference of background information . It takes the region of interest of template as variable template and its central subregion as central template, and then performs global and local correlation operations, respectively. Experiments on benchmarks including OTB100, VOT2019, LaSOT, UAV123, and VOT2018 show that our tracker achieves promising performance.

Matthew Muller, Steve Kroon and Stephan Chalup

Topological Dynamics of Functional Neural Network Graphs During Reinforcement Learning

ABSTRACT. This study investigates the topological structures of neural network activation graphs, with a focus on detecting higher-order Betti numbers during reinforcement learning. The paper presents visualisations of the neurotopological dynamics of reinforcement learning agents both during and after training, which are useful for different dynamics analysis which we explore in this work. Two applications are considered: frame-by-frame analysis of agent neurotopology and tracking per-neuron presence in cavity boundaries over training steps. The experimental analysis suggests that higher-order Betti numbers found in a neural network's functional graph can be associated with learning more complex behaviours.

Chunxi Guo, Zhiliang Tian, Jintao Tang, Shasha Li, Zhihua Wen, Kaixuan Wang and Ting Wang

Retrieval-augmented GPT-3.5-based Text-to-SQL Framework with Sample-aware Prompting and Dynamic Revision Chain

ABSTRACT. Text-to-SQL aims at generating SQL queries for the given natural language questions and thus helping users to query databases. Prompt learning with large language models (LLMs) has emerged as a recent approach, which designs prompts to lead LLMs to understand the input question and generate the corresponding SQL. However, it faces challenges with strict SQL syntax requirements. Existing work prompts the LLMs with a list of demonstration examples (i.e. question-SQL pairs) to generate SQL, but the fixed prompts fixed prompts can hardly handle the scenario where the semantic gap between the retrieved demonstration and the input question is large. In this paper, we propose a retrieval-augmented prompting method for a LLM-based Text-to-SQL framework, involving sample-aware prompting and a dynamic revision chain. Our approach incorporates sample-aware demonstrations, which include the composition of SQL operators and fine-grained information related to the given question. To tackle the problem of employing different expressions to convey the same SQL intention, we propose two strategies for assisting retrieval. Firstly, we leverage LLMs to simplify the original questions, unifying the syntax and thereby clarifying the intentions of the users. To generate executable and accurate SQLs without human intervention, we design a dynamic revision chain which iteratively adapts fine-grained feedback from the previously generated SQL. Experimental results on three Text-to-SQL benchmarks demonstrate the superiority of our method over strong baseline models.

Shaohua Li, Weimin Li, Alex Luvembe and Weiqin Tong

Rumor Detection with Supervised Graph Contrastive Regularization

ABSTRACT. Rumors generated on social networks can spread quickly and have a serious impact on social stability and residents' daily life. Recently, rumor detection methods based on feedback information and propagation structure generated during user interaction have received attention. Most rumors with salient features can be effectively distinguished by graphical models employing cross-entropy loss. However, these traditional models may lead to poor generalization and lack of robustness in the face of noisy reviews and mislabels containing malicious fabrications. In this paper, we propose a novel Supervised Graph Contrastive Regularization (SGCR) method to deal with these complex situations, in which the label information is used for supervised contrastive learning by applying simple regularization to the embedding variance of each dimension separately. To explicitly avoid the crash problem, session threads belonging to the same class are pulled together in the embedding space, while session threads from different classes are separated. When updating model parameters through backpropagation, we reduce gradient conflicts between different tasks through gradient projection. Experimental results on two real-world datasets demonstrate that our SGCR performs better than baseline methods.

Ziwen Cheng, Yi Liu, Chao Wu, Yongqi Pan, Liushun Zhao and Cheng Zhu

PoShapley-BCFL: A fair and robust decentralized federated learning based on blockchain and the proof of Shapley-value

ABSTRACT. Recently, blockchain-based Federated learning(BCFL) has emerged as a promising technology for promoting data sharing in the Internet of Things(IoT) without relying a central authority, while ensuring data privacy, security, and traceability. However, it remains challenging to design an decentralized and appropriate incentive scheme that should promise a fair and efficient contribution evaluation for participants while defending against low-quality data attacks. Although Shapley-Value(SV) methods have been widely adopted in FL due to their ability to quantify individuals' contributions, they rely on a central server for calculation and incur high computational costs, making it impractical for decentralized and large-scale BCFL scenarios. In this paper, we designed and evaluated PoShapley-BCFL, a new blockchain-based FL approach to accommodate both contribution evaluation and defense against inferior data attacks. Specifically, we proposed PoShapley, a Shapley-value-enabled blockchain consensus protocol tailored to support a fair and efficient contribution assessment in PoShapley-BCFL. It mimics the Proof-of-Work mechanism that allows all participants to compute contributions in parallel based on an improved lightweight SV approach. Following using the PoShapley protocol, we further designed a fair-robust aggregation rule to improve the robustness of PoShapley-BCFL when facing inferior data attacks. Extensive experimental results validate the accuracy and efficiency of PoShapley in terms of distance and time cost, and also demonstrate the robustness of our designed PoShapley-BCFL.

Qi Wang, Haiyan Zhang, Junkai Ji, Cheng Tang and Yajiao Tang

Dendritic Neural Regression Model Trained by Chicken Swarm Optimization Algorithm for Bank Customer Churn Prediction

ABSTRACT. Recently, banks and other enterprises are constantly facing the problem of customers churning. As an important component of Customer Relationship Management, predicting customer churn has been increasingly urgent. Inspired by biological neurons, we build up a dendritic neural network model with four layers, namely synaptic layer, dendritic layer, membrane layer and soma layer for bank customer churn prediction. CSO algorithm is used in this experiment as one of our training algorithms. In this paper, we propose a novel dendritic neural model called CSO-DNM, and experimental results are based on a benchmark dataset from Kaggle. Compared with other algorithms and models, our proposed model obtains the highest accuracy and convergence speed in customer churn prediction.

Binxia Yang, Junlin Zhu, Xudong Luo and Xinrui Zhang

BERT-LBIA: A BERT-Based Late Bidirectional Interaction Attention Model for Legal Case Retrieval

ABSTRACT. Most legal case retrieval methods rely on pre-trained language models like BERT, which can be slow and inaccurate. Alternatively, representation-based models provide quick responses but may not be the most accurate. To address these issues, our paper proposes a BERT-based late bidirectional interaction attention model for similar legal case retrieval. We use a dual BERT model as our backbone network to obtain feature representations of a query and its case candidates. Then, we develop a bidirectional interaction attention network to generate deep interactive attention signals between the query and its corresponding case candidates. Our experiments show that our model is faster and more accurate than existing retrieval models.

Zhi Zhang, Junan Yang and Hui Liu

ML2FNet: A Simple but Effective Multi-Level Feature Fusion Network for Document-Level Relation Extraction

ABSTRACT. Document-level relation extraction presents new challenges compared to its sentence-level counterpart, which aims to extract relations from multiple sentences. Current graph-based and transformer-based models have achieved certain success. However, most approaches focus only on local information, independently on a certain entity, without considering the global interdependency among the relational triples. To solve this problem, this paper proposes a novel relation extraction model with a Multi-Level Feature Fusion Network (ML2FNet). Specifically, we first establish the interaction between entities by constructing an entity-level relation matrix. Then, we employ an enhanced U-shaped network to fuse the multi-level feature of entity pairs from local to global. Finally, the relation classification of entity pairs is performed by a bilinear classifier. We conduct experiments on three public document-level relation extraction datasets, and the results show that ML2FNet outperforms the other baselines.

Li Yao, Ao Gao and Yan Wan

Implicit Clothed Human Reconstruction Based on Self-attention and SDF

ABSTRACT. Recently, implicit function-based approaches have advanced 3D human reconstruction from a single-view image. However, previous methods suffer from problems such as artifacts, broken limbs, and loss of surface details when dealing with challenging poses. To address these issues, this paper proposes a novel neural network model based on PaMIR. Firstly, the Signed Distance Function (SDF) is introduced to define an implicit function, which improves the generalization ability of the model. Secondly, a feature volume encoding network with self-attention mechanism is designed to extract voxel-aligned features and provide richer geometric information, further improving the accuracy of shape topology structure. Through validation on the CAPE dataset, our method exceeds the PaMIR by 50.9% and 30.6% reduction in Chamfer and Point-to-Surface Distances respectively, and 18.2% reduction in normal estimation errors.

Zhenxiong Zhou

Improving GNSS-R Sea Surface Wind Speed Retrieval from FY-3E Satellite Using Multi-Task Learning and Physical Information

ABSTRACT. Global Navigation Satellite System Reflectometry (GNSS-R) technology has great advantages over traditional satellite remote sensing detection of sea surface wind field in terms of cost and timeliness. It has attracted increasing attention and research from scholars around the world. This paper focuses on the Fengyun-3E (FY-3E) satellite, which carries the GNOS II sensor that can receive GNSS-R signals. We analyze the limitations of the conventional sea surface wind speed retrieval method and the existing deep learning model for this task, and propose a new sea surface wind speed retrieval model for FY-3E satellite based on a multi-task learning (MTL) network framework. The model uses the forecast product of Hurricane Weather Research and Forecasting (HWRF) model as the label, and inputs all the relevant information of Delay-Doppler Map (DDM) in the first-level product into the network for comprehensive learning. We also add wind direction, U wind and V wind physical information as constraints for the model. The model achieves good results in multiple evaluation metrics for retrieving sea surface wind speed. On the test set, the model achieves a Root Mean Square Error (RMSE) of 2.5 and a Mean Absolute Error (MAE) of 1.85. Compared with the second-level wind speed product data released by Fengyun Satellite official website in the same period, which has an RMSE of 3.37 and an MAE of 1.9, our model improves the performance by 52.74% and 8.65% respectively, and obtains a better distribution.

Tao Jiang, Junjiang He, Tao Li, Wenbo Fang, Wenshan Li and Cong Tang

An Attack Entity Deducing Model for Attack Forensics

ABSTRACT. The forensics of Advanced Persistent Threat (APT) attacks, known for their prolonged duration and utilization of multiple attack methods, require extensive log analysis to discern their attack steps. Facing the massive amount of data, researchers have increasingly turned to extended machine learning methods to enhance attack forensics. However, the limited number of attack samples used for training and the inability of the data to accurately represent real-world scenarios pose significant challenges. To address these issues, we propose ASAI, an attack deduction model that leverages auxiliary strategies and dynamic word embeddings. Firstly, ASAI tackles the problem of data imbalance through a sequence sampling method enhanced by a custom auxiliary strategy. Subsequently, the sequences are transformed into dynamic vectors using dynamic word embedding. The model is trained to capture the spatio-temporal characteristics of entities under diverse contextual conditions by employing these dynamic vectors. In this paper, ASAI is evaluated using ten real-world APT attacks executed within an actual virtual environment. The results demonstrate ASAI's ability to successfully recover the key steps of the attacks and construct attack stories, achieving an impressive F1 score of up to 99.70%—a significant 16.98% improvement over the baseline.

Shu Mo, Kai Hu, Weibing Li and Yongping Pan

Small-World Echo State Networks for Nonlinear Time-Series Prediction

ABSTRACT. Echo state networks (ESNs) are an efficient paradigm for training recurrent neural networks (RNNs). However, ESNs sometime suffer from poor performance and robustness due to the non-trainable reservoir. This paper proposes a novel computational framework for ESNs, where a small-world network is applied as the reservoir topology, a biologically plausible unsupervised learning method named dual-threshold Bienenstock-Cooper-Munro (DTBCM) learning rule is applied to adjust reservoir weights adaptively, and a recursive least square (RLS) algorithm equipped with memory regressor extension is applied to update readout weights. The proposed model is compared with several kinds of ESNs on two benchmarking nonlinear time-series datasets, including the 10th-order nonlinear autoregressive moving average (NARMA10) system and the Mackey Glass system. Simulation results show that the proposed model not only achieves the best prediction performance but also exhibits remarkable robustness against noise.

Qing Zhang, Yuechen Yang, Hayilang Zhang, Zhengxin Gao, Hao Wang, Jianyong Duan, Li He and Jie Liu

Zero-shot Relation Triplet Extraction via Retrieval-Augmented Synthetic Data Generation

ABSTRACT. In response to the challenge of existing relation triplet extraction models struggling to adapt to new relation categories in zeroshot scenarios, we propose a method that combines generated synthetic training data with the retrieval of relevant documents through a rankbased filtering approach for data augmentation. This approach alleviates the problem of low-quality synthetic training data and reduces noise that may affect the accuracy of triplet extraction in certain relation categories. Experimental results on two public datasets demonstrate that our model exhibits stable and impressive performance compared to the baseline models in terms of precision, recall, and F1 score, resulting in improved effectiveness for zero-shot relation triplet extraction.

Yu Qiao, Hengyi Zhang, Pengfei Sun, Yuan Tian, Yong Guan, Zhenzhou Shao and Zhiping Shi

Parallelizable Simple Recurrent Units with Hierarchical Memory

ABSTRACT. Recurrent neural networks and its many variants have been widely used in language modeling, text generation, machine translation, speech recognition and so forth, due to the excellent ability to process sequence data. However, the above-mentioned networks are constructed in a multi-layer stacking way, which makes the memory-dependent information in the distant past continuously decay. To this end, this paper proposes a parallelizable simple recurrent unit with hierarchical memory (PSRU-HM) to preserve more long-term historical information for inference. It is achieved by the nested SRU structure and realizes the information interaction between inner and outer memory cell through the connection between inner and outer layers. The depth of network can be dynamically adjusted according to the task complexity. Meanwhile, a jump connection that combines high-level and low-level features is added to the outermost layer. It maximizes the utilization of effective input information. In order to accelerate the training and inference of the network, the weights of PSRU-HM are reorganized to enable the parallelization deployment in the CUDA framework. Extensive experiments are conducted to verify the proposed method using several public datasets, including text classification, language modeling and question answering. Experimental results show that PSRU-HM outperforms the traditional methods and achieves 2× speed-up compared to cuDNN optimized LSTM.

Jingbo Sun, Weiming Peng, Zhiping Xu, Shaodong Wang, Tianbao Song, Jihua Song and Xianwei Xin

Incorporating Syntactic Cognitive in Multi-granularity Data Augmentation for Chinese Grammatical Error Correction

ABSTRACT. Chinese grammatical error correction (CGEC) has recently attracted a lot of attention due to its real-world value. The current mainstream approaches are all data-driven, but the following flaws still exist: First, there is less high-quality training data with complexity and a variety of errors, and data-driven approaches frequently fail to significantly increase performance due to the lack of data. Second, the existing data augmentation methods for CGEC mainly focus on word-level augmentation and ignore syntactic-level information. Third, the currently data augmentation are strongly randomized, and fewer of them can fit the pattern of students cognition of grammatical errors for CGEC. In this paper, we propose a novel multi-granularity data augmentation method for CGEC. We construct a syntactic error knowledge base for error type Missing and Redundant Components, and syntactic conversion rules for error type Improper Word Order based on a finely labeled syntactic structure treebank. Additionally, we compile a knowledge base of character and word errors from actual student essays. Then, a data augmentation algorithm incorporating character, word, and syntactic noise is designed to build the training set. Extensive experiments and detailed analysis show that the F0.5 in the test set is 36.77%, which is a 6.2% improvement compared to the best model in the NLPCC Shared Task, proving the validity of our method.

Xian Li, Hongguang Shi, Yunfei Wang, Yeqin Zhang, Xubin Li and Cam-Tu Nguyen

Long Short-Term Planning for Conversational Recommendation Systems

ABSTRACT. In Conversational Recommendation Systems (CRS), the central question is how the conversational agent can naturally ask for user preferences and provide suitable recommendations. Existing works mainly follow the hierarchical architecture, where a higher policy decides whether to invoke the conversation module (to ask questions) or the recommendation module (to make recommendations). This architecture prevents these two components from fully interacting with each other. In contrast, this paper proposes a novel architecture, the long short-term feedback architecture, to connect these two essential components in CRS. Specifically, the recommendation predicts the long-term recommendation target based on the conversational context and the user history. Driven by the targeted recommendation, the conversational model predicts the next topic or attribute to verify if the user preference matches the target. The balance feedback loop continues until the short-term planner output matches the long-term planner output, that is when the system should make the recommendation.

Laichao Wang, Weiding Lu, Yuan Tian, Yong Guan, Zhenzhou Shao and Zhiping Shi

6D Object Pose Estimation with Attention Aware Bi-Gated Fusion

ABSTRACT. Accurate object pose estimation is a prerequisite for successful robotic grasping tasks. Currently keypoint-based pose estimation methods using RGB-D data have shown promising results in simple environments. However, how to fuse the complementary features from RGB-D data is still a challenging task. To this end, this paper proposes a two-branch network with attention aware bi-gated fusion (A2BF) module for the keypoint-based 6D object pose estimation, named A2BNet for abbreviation. A2BF module consists of two key components, bidirectional gated fusion and attention mechanism modules. The former is introduced to selectively filter and fuse RGB and point cloud information, and the latter is responsible to prioritize essential information while disregarding irrelevant details. Several A2BF modules can be embedded in the network to generate complementary texture and geometric information. Extensive experiments are conducted on the public LineMOD and Occlusion LineMOD datasets. Experimental results demonstrate that the average accuracy using the proposed method on both datasets can reach 99.8\% and 67.6\% respectively, outperforms the state-of-the-art methods.

Yong Wang, Cheng Lu, Yuan Zong, Hailun Lian, Yan Zhao and Sunan Li

Time-Frequency Transformer: A Novel Time Frequency Joint Learning Method for Speech Emotion Recognition

ABSTRACT. In this paper, we propose a novel time-frequency joint learning method for speech emotion recognition, called Time-Frequency Transformer. Its advantage is that the Time-Frequency Transformer can excavate global emotion patterns in the time-frequency domain of speech signal while modeling the local emotional correlations in the time domain and frequency domain respectively. For the purpose, we first design a Time Transformer and Frequency Transformer to capture the local emotion patterns between frames and inside frequency bands respectively, so as to ensure the integrity of the emotion information modeling in both time and frequency domains. Then, a Time-Frequency Transformer is proposed to mine the time-frequency emotional correlations through the local time-domain and frequency-domain emotion features for learning more discriminative global speech emotion representation. The whole process is a time-frequency joint learning process implemented by a series of Transformer models. Experiments on IEMOCAP and CASIA databases indicate that our proposed method outdoes the state-of-the-art methods.

Guanbao Liang, Zhaojie Qian, Shuang Wang and Pengyi Hao

MOOCs Dropout Prediction via Classmates Augmented Time-Flow Hybrid Network

ABSTRACT. Massive Open Online Courses (MOOCs) provides learners a platform for free learning. However, MOOCs have been criticized for the high dropout rates in recent years. For the purpose of predicting users' potential dropout risk in advance, a novel framework named as Classmates Augmented Time-Flow Hybrid Network (CA-TFHN) is proposed in this paper. TFHN, absorbed the advantages of LSTM and self-attention mechanism, is designed to generate the activity features of users by using users' learning records. At the same time, an effective correlation calculation is defined based on users' potential interests on courses with link prediction, bringing in relationships of classmates. The influences among classmates, modeled by a reconstructed user graph, are employed to augment the activity features of user, resulting in accurate dropout prediction. Experiments on the XuetangX dataset demonstrate the effectiveness of CA-TFHN in predicting dropout of MOOCs. The codes of CA-TFHN are available from https://github.com/codeds27/CA-TFHN.

Xinyi Wang and Hui Yan

Gated Bi-View Graph Structure Learning

ABSTRACT. Graph structure learning (GSL), which aims to optimize graph structure and learn suitable parameters of graph neural networks (GNNs) simultaneously, has shown great potential in boosting the performance of GNNs. As a branch of GSL, multi-view methods mainly learn an optimal graph structure (final view) from multiple information sources (basic views). However, basic views’ structural information is insufficient, existing methods ignore the fact that different views can complement each other. Moreover, existing methods obtain the final view through simple combination, fail to constrain the noise, which inevitably brings irrelevant information. To tackle these problems, we propose a Gated Bi-View GSL architecture, named GBV-GSL, which interacts two basic views through a selection gating mechanism, so as to "turn off" noise as well as supplement insufficient structures. Specifically, two basic views that focus on different knowledge are extracted from original graph as two inputs of the model. Furthermore, we propose a novel view interaction technique based on selection gating mechanism to remove redundant structural information and supplement insufficient topology while retaining their focused knowledge. Finally, we design a view attention fusion mechanism to adaptively fuse two interacted views to generate the final view. In numerical experiments involving both clean and attacked conditions, GBV-GSL shows significant improvements in the effectiveness and robustness of structure learning and node representation learning. Code is available at https://github.com/Simba9257/GBV-GSL.

Xiangying Xie, Xinyue Liu, Qixiang Chen and Biao Leng

Efficient Lightweight Network with Transformer-based Distillation for Micro-crack Detection of Solar Cells

ABSTRACT. Micro-cracks on solar cells often affect the power generation efficiency, so this paper proposes a lightweight network for cell image micro-crack detection task. Firstly, a Feature Selection framework is proposed, which can efficiently and adaptively decide the number of layers of the feature extraction network, and clip unnecessary feature generation process. In addition, based on the design of the Transformer layer, Transformer Distillation is proposed. In Transformer Distillation, the designed Transformer Refine module excavates the distillation information from the two dimensions of features and relations. Using a combination of Feature Selection and Transformer Distillation, the lightweight networks based on ResNet and ViT can achieve much better effects than the original networks, with classification accuracy rates of 88.58% and 89.35% respectively.

Liu Gang, Wang Tongli, Yang Wenli, Yan Zhizheng and Zhan Kai

MTLAN: Multi-Task Learning and Auxiliary Network for Enhanced Sentence Embedding

ABSTRACT. The objective of cross-lingual sentence embedding learning is to map sentences into a shared representation space, where semantically similar sentence representations are closer together, while distinct sentence representations exhibit clear differentiation. This paper proposes a novel sentence embedding model called MTLAN, which incorporates multi-task learning and auxiliary networks. The model utilizes the LaBSE model for extracting sentence features and undergoes joint training on tasks related to sentence semantic representation and distance measurement. Furthermore, an auxiliary network is employed to enhance the contextual expression of words within sentences. To address the issue of limited resources for low-resource languages, we construct a pseudo-corpus dataset using a multilingual dictionary for unsupervised learning. We conduct experiments on multiple publicly available datasets, including STS and SICK, to evaluate both monolingual sentence similarity and cross-lingual semantic similarity. The empirical results demonstrate the significant superiority of our proposed model over state-of-the-art methods.

Pathirage N. Deelaka, Devin Y De Silva, Sandareka Wickramanayake, Dulani Meedeniya and Sanka Rasnayaka

TEZARNet : TEmporal Zero-shot Activity Recognition Network

ABSTRACT. Human Activity Recognition (HAR) using data from Inertial Measurement Unit (IMU) sensors has practical applications in healthcare and assisted living environments. However, its use in real-world scenarios has been limited due to the lack of comprehensive IMU-based HAR datasets covering various activities. Zero-shot HAR (ZS-HAR) can overcome these data limitations. However, most existing ZS-HAR methods based on IMU data rely on attributes or word embeddings of class labels as auxiliary data to relate the seen and unseen classes. This approach requires expert knowledge and lacks motion-specific information. In contrast, videos depicting various human activities provide valuable information for ZS-HAR based on inertial sensor data, and they are readily available. Our proposed model, TEZARNet: TEmporal Zero-shot Activity Recognition Network, uses videos as auxiliary data and employs a Bidirectional Long-Short Term IMU encoder to exploit temporal information, distinguishing it from current work. The proposed model outperforms the state-of-the-art accuracy by 4.7%, 7.8%, 3.7%, and 9.3% for benchmark datasets PAMAP2, DaLiAc, UTD-MHAD, and MHEALTH, respectively.

Cui Chen, Zuping Zhang and Panrui Tang

Spatial Gene Expression Prediction using Hierarchical Sparse Attention

PRESENTER: Cui Chen

ABSTRACT. Spatial Transcriptomics (ST) quantitatively interprets human diseases by providing the gene expression of each fine-grained spot (i.e., window) in a tissue slide. This paper focuses on predicting gene expression at windows on a tissue slide image of interest. However, gene expression related to image features usually exhibits diverse spatial scales. To spatially model these features, we newly introduced Hierarchical Sparse Attention Network(HSATNet ). The core idea of HSATNet is to apply two level sparse attention, coarse (i.e., area) and fine, respectively. Each HSAT Block consists of two main modules: i) an adaptive sparse coarse attention, filter out the most irrelevant areas to acquire adaptive sparse areas, ii) an adaptive sparse fine attention, filter out the most irrelevant windows to acquire adaptive sparse windows. The first module finds the sameness of the different windows with the same area, and the second captures the difference of the different windows. Particularly, after fusion this two modules, without any additional training data or pre-training, experiments conducted on 10X Genomics breast cancer data show that our HSATNet achieves an impressive PCC@S of 7.43 for gene expression prediction. This performance exceeds the current state- of-the-art model. Code is available at https://github.com/biyecc/HSATNet.

Zoya Volovikova, Petr Kuderov and Alexandr Panov

Interpreting Decision Process in Offline Reinforcement Learning for Interactive Recommendation Systems

ABSTRACT. Recommendation systems, which predict relevant and appealing items for users on web platforms, often rely on static user interests, resulting in limited interactivity and adaptability. Reinforcement Learning (RL), while providing a dynamic and adaptive approach, brings its unique challenges in this context. Interpreting the behavior of an RL agent within recommendation systems is complex due to factors such as the vast and continuously evolving state and action spaces, non-stationary user preferences, and implicit, delayed rewards often associated with long-term user satisfaction.

Addressing the inherent complexities of applying RL in recommendation systems, we propose a framework that includes innovative metrics and a synthetic environment. The metrics aim to assess the real-time adaptability of an RL agent to dynamic user preferences. We apply this framework to LastFM datasets to interpret metric outcomes and test hypotheses regarding MDP setups and algorithm choices by adjusting dataset parameters within the synthetic environment. This approach illustrates potential applications of our framework, while highlighting the necessity for further research in this area.

Xi Luo, Yuwei Li and Jingyi Yu

Reconstructing Challenging Hand Posture from Multi-modal Input

ABSTRACT. 3D Hand reconstruction is critical for immersive VR/AR, action understanding or human healthcare. Without considering actual skin or texture details, existing solutions have concentrated on recovering hand pose and shape using parametric models or learning techniques. In this study, we introduce the first challenging hand dataset, CHANDS, which is composed of articulated precise 3D geometry corresponding to previously unheard-of challenging gestures performed by real hands. Specifically, we construct a multi-view camera setup to acquire multi-view images for initial 3D reconstructions and use a hand tracker to separately capture the skeleton. Then, we present a robust method for reconstructing an articulated geometry and matching the skeleton to the geometry using a template. In addition, we build a hand pose model from CHANDS that covers a wider range of poses and is particularly helpful for difficult poses.

Gewangzi Du, Liwei Chen, Tongshuai Wu, Chenguang Zhu and Gang Shi

Identify Vulnerability Types: A Cross-Project Multiclass Vulnerability Classification System based on Deep Domain Adaptation

ABSTRACT. Software Vulnerability Detection(SVD) is a important means to ensure system security due to the ubiquity of software. Deep learningbased approaches achieve state-of-the-art performance in SVD but one of the most crucial issues is coping with the scarcity of labeled data in projects to be detected. One reliable solution is to employ transfer learning skills to leverage labeled data from other software projects. However, existing cross-project approaches only focused on detecting whether the function code is vulnerable or not. The requirement to identify vulnerability types is essential because it offers information to patch the vulnerabilities. Our aim in this paper is to propose the first system for cross-project multiclass vulnerability classification. We detect at the granularity of code snippet, which is finer-grained compare to function and effective to catch inter-procedure vulnerability patterns. After generating code snippets, we define several principles to extract snippet attentions and build a deep model to obtain the deep fused features; We then extend different domain adaptation approaches to reduce feature distributions of different projects. Experimental results indicate that our system outperforms other state-of-the-art systems.

Zhiqiang Wang, Yiping Yang and Junjie Ma

Two-Stage Graph Convolutional Networks for Relation Extraction

ABSTRACT. The purpose of relation extraction is to extract semantic relationships between entities in sentences, which can be seen as a classification task. In recent years, the use of graph neural networks to handle relation extraction tasks has become increasingly popular. However, most existing graph-based methods have the following problems: 1) they cannot fully utilize dependency relation information; 2) there is no consistent criterion for pruning dependency trees. To address these issues, we propose a two-stage graph convolutional networks for relation extraction. In the first stage of the model, the node representation, dependency relation type representation and dependency type weight jointly generate new node representations, fully utilizing the dependency relation information. In the second stage, with the help of the adjacency matrix derived from the dependency tree, the graph convolution operation is performed. In this way, the model can automatically complete the pruning operation. We evaluated our proposed method on two public datasets, and the results show that our model outperforms previous studies in terms of F1 score and achieves the best performance. Further ablation experiments also confirm the effectiveness of each component in our proposed model.

Zhuoyuan Wu, Jiancheng Cai, Ranran Huang, Xinmin Liu and Zhenhua Chai

Progressive Temporal Transformer for Bird's-Eye-View Camera Pose Estimation

ABSTRACT. Visual relocalization is a crucial technique used in visual odometry and SLAM to predict the 6-DoF camera pose of a query image. Existing works mainly focus on ground view in indoor or outdoor scenes. However, camera relocalization on unmanned aerial vehicles is less focused. Also, frequent view changes and a large depth of view make it more challenging. In this work, we establish a Bird's-Eye-View (BEV) dataset for camera relocalization, a large dataset contains four distinct scenes (\emph{roof}, \emph{farmland}, \emph{bare ground}, and \emph{urban area}) with such challenging problems as frequent view changing, repetitive or weak textures and large depths of fields. All images in the dataset are associated with a ground-truth camera pose. The BEV dataset contains 177242 images, a challenging large-scale dataset for camera relocalization. We also propose a Progressive Temporal transFormer (dubbed as PTFormer) as the baseline model. PTFormer is a sequence-based transformer with a designed progressive temporal aggregation module for temporal correlation exploitation and a parallel absolute and relative prediction head for implicitly modeling the temporal constraint. Thorough experiments are exhibited on both the BEV dataset and widely used handheld datasets of 7Scenes and Cambridge Landmarks to prove the robustness of our proposed method.

Yao Liu, Wenlong Ni, Yang Bi, Lingyue Lai, Xinyu Zhou and Hua Chen

Task Scheduling with Multi-strategy Improved Sparrow Search Algorithm in Cloud Datacenters

ABSTRACT. Combining the task scheduling characteristics of the cloud computing environment, an improved sparrow search algorithm MSSA that takes into account task completion time, task completion cost and load balancing is proposed. First, the initialization of the population using PWLCM chaotic mapping enhances the degree of individual dispersion. After that, the global search phase in the marine predator algorithm (MPA) is combined to enhance the search space of the discoverer. The introduction of dynamic adjustment factors in the joiner part strengthens the search ability of the algorithm in the early stage and the convergence ability in the late stage. Finally, the greedy strategy is used to update the joiner’s position so that the information of the optimal solution and the worst solution can guide the next generation of position updates. Using CloudSim for simulation, the experimental results show that the improved algorithm has a shorter task completion time and a more balanced system load. Compared with the ACO, MPA, and SSA algorithms, the MSSA algorithm improves the integrated fitness function values by 20% , 22%, and 17%, confirming the feasibility of the improved algorithm.

Wenlong Ni, Chuanzhuang Li, Peng Wang and Zehong Li

Traffic Signal Optimization at T-shaped intersections Based on Deep Q Networks

ABSTRACT. In this paper traffic signal control strategies for T-shaped intersections in urban road networks using deep Q network (DQN) algorithms are proposed. Different DQN networks and dynamic time aggregation were used for decision-makings. The effectiveness of various strategies under different traffic conditions are checked using the Simulation of Urban Mobility(SUMO) software. The simulation results showed that the strategy combining the Dueling DQN method and dynamic time aggregation significantly improved vehicle throughput. Compared with DQN and fixed-time methods, this strategy can reduce the average travel time by up to 43\% in low-traffic periods and up to 15\% in high-traffic periods. This study demonstrated the significant advantages of applying Dueling DQN in traffic signal control strategies for urban road networks.

Wenlong Ni, Zehong Li, Peng Wang and Chuanzhuang Li

Enhanced State-Aware Traffic Light Optimization Control Method

ABSTRACT. In this paper, the Dueling Double Deep Recurrent Q Network - Attention Mechanism (3DRQN_AM) algorithm is proposed for TLC. algorithm is based on deep Q networks, using competing networks, double Q networks and target networks to improve the earning performance of the algorithm; The Long Short Term Memory (LSTM) network is introduced to combine the historical and current states of the vehicle trajectory for optimal decision making. Meanwhile, an attention mechanism is added to make the neural network automatically focus on the important state components and improve the state representation capability. The experimental results show that, compared with he Dueling Double Deep Q Network -Attention Mechanism (3DQN_AM), Dueling Double Deep Recurrent Q Network (3DRQN) and Dueling Double Deep Q Network (3DQN) signal control algorithms, the average waiting time under normal traffic flow is reduced by about 20.8%, 32.1%, and 36.7%, respectively, and the average queue length is reduced by about 41.9%, 44.6%, and 76%, respectively; The average waiting time under peak traffic flow is reduced by about 46.2%, 53.3%, 85.1%, and in the average queue is reduced by about 2.7%, 2.7%, 21.3%, respectively.

Yihui Zhang, Cong Hu, Binbin Qiu and Ning Tan

Responsive CPG-Based Locomotion Control for Quadruped Robots

ABSTRACT. Quadruped robots with flexible movement are gradually replacing traditional mobile robots in many spots. In order to improve the motion stability and speed of the quadruped robot, this paper presents a responsive gradient CPG (RG-CPG) approach. Specifically, the method introduces a vestibular sensory feedback mechanism into the gradient CPG (central pattern generators) model and uses a differential evolution algorithm to optimize the vestibular sensory feedback parameters. Simulation results show that the movement stability and linear movement velocity of the quadruped robot controlled by RG-CPG are effectively improved, and the quadruped robot has the ability to cope with complex terrains. Prototype experiments demonstrate that RG-CPG works for real quadruped robots.

Qiyuan Liu, Jinzheng Lu, Qiang Li and Bingsen Huang

High-Resolution Self-Attention with Fair Loss for Point Cloud Segmentation

ABSTRACT. Applying deep learning techniques to analyze point cloud data acquired by various three-dimensional (3D) sensors has emerged as a prominent research direction. However, the challenges posed by insufficient spatial and feature information integration in the original point cloud and unbalanced classes in real-world datasets have hindered the advancement of research in this domain. In light of the success achieved by self-attention mechanisms in natural language processing and two-dimensional (2D) vision tasks, we put forward the High-Resolution Self-Attention (HRSA) module as a plug-and-play solution for facilitating point cloud segmentation. More precisely, the proposed HRSA module is designed to preserve high-resolution internal representations in both spatial and feature dimensions. HRSA ensures that each branch retains complete spatial and feature information while efficiently compressing the other dimension. Additionally, by affecting the gradient of dominant and weak classes, we introduce the Fair Loss to address the problem of unbalanced class distribution on a real-world dataset to improve the network's inference capabilities. The introduced modules are seamlessly integrated into an MLP-based architecture tailored for large-scale point cloud processing, resulting in a new segmentation network called PointHR. PointHR achieves impressive performance with mIoU scores of 69.8% and 74.5% on S3DIS Area-5 and 6-fold cross-validation, respectively. With a significantly smaller number of parameters, these performances are remarkably close to the state-of-the-art methods, making PointHR highly competitive in point cloud semantic segmentation.

Cheng Chen, Jinshuai Yang, Yue Gao, Huili Wang and Yongfeng Huang

Minimizing Distortion in Linguistic Steganography via Adaptive Language Model Tuning

ABSTRACT. Linguistic steganography, a technique that hides secret information within normal text, possesses tremendous potential in various applications such as protecting user privacy. However, previous research in linguistic steganography has primarily focused on adjusting the probability distribution of steganographic text (stegotext) to minimize the difference with text generated by language models, thereby achieving indistinguishability between the two. Nonetheless, the significant gap between real text and generated text has often been overlooked. To address this issue, this paper proposes an innovative method: using an adaptive model tuning strategy, the generated stegotext becomes statistically closer to real text. We leverage a well-trained classifier in conjunction with a fundamental generative language model to produce stegotext that aligns closely with the distribution of real text. Consequently, we gain better control over the distortion between the stegotext and real text, while effectively embedding secret information. Compared to traditional methods, our approach reduces Kullback-Leibler divergence and steganography detection rates, demonstrating its enhanced effectiveness.

Xingcheng Tong and Xiaozheng Jin

Nonlinear NN-Based Perturbation Estimator Designs for Disturbed Unmanned Systems

ABSTRACT. This paper deals with the perturbation estimation problem for a classical unmanned system against perturbations composed by internal uncertainties of systems and external disturbances. In order to approximate the unmeasurable perturbations accurately, a new nonlinear radial basis function neural network (RBFNN)-based estimator is developed to rebuild the structure of perturbations. It is established that the asymptotic estimation result can be achieved by utilizing RBFNN-based estimator designs through Lyapunov stability analysis. The efficacy of the developed perturbation estimation method is substantiated by the simulations of an unmanned marine system and a quadrotor system.

Xuefeng Wei and Xuan Zhou

FLDNet: A Foreground-Aware Network for Polyp Segmentation Leveraging Long-Distance Dependencies

ABSTRACT. Given the close association between colorectal cancer and polyps, the diagnosis and identification of colorectal polyps play a critical role in the detection and surgical intervention of colorectal cancer. In this context, the automatic detection and segmentation of polyps from various colonoscopy images has emerged as a significant problem that has attracted broad attention. Current polyp segmentation techniques face several challenges: firstly, polyps vary in size, texture, color, and pattern; secondly, the boundaries between polyps and mucosa are usually blurred, existing studies have focused on learning the local features of polyps while ignoring the long-range dependencies of the features, and also ignoring the local context and global contextual information of the combined features. To address these challenges, we propose FLDNet (Foreground-Long-Distance Network), a Transformer-based neural network that captures long-distance dependencies for accurate polyp segmentation. Specifically, the proposed model consists of three main modules: a pyramid-based Transformer encoder, a local context module, and a foreground-Aware module. Multilevel features with long-distance dependency information are first captured by the pyramid-based transformer encoder. On the high-level features, the local context module obtains the local characteristics related to the polyps by constructing different local context information. The coarse map obtained by decoding the reconstructed highest-level features guides the feature fusion process in the foreground-Aware module of the high-level features to achieve foreground enhancement of the polyps. Our proposed method, FLDNet, was evaluated using seven metrics on common datasets and demonstrated superiority over state-of-the-art methods on widely-used evaluation measures.

Yuhao Wang and Yuantao Gu

MelMAE-VC: Extending Masked Autoencoders to Voice Conversion

ABSTRACT. Voice conversion is a technique that generates speeches with text contents same as source speeches and timbre features same as reference speeches. This paper proposes MelMAE-VC, a neural network for non-parallel many-to-many voice conversion that utilizes pre-trained Masked Autoencoders (MAEs) for representation learning. Our neural network consists of mainly transformer layers and no recurrent units, aiming to achieve better scalability and parallel computing capability. We follow a similar scheme of image-based MAE in pre-training phase that conceals a portion of input spectrogram and set up a vanilla autoencoding task for training. The encoder yields latent representation from visible subset of the full spectrogram; then the decoder reconstructs the full spectrogram from the representation of only visible patches. To achieve voice conversion, we adopt the pre-trained encoder to extract preliminary features, then use a speaker embedder to control speakers' timbre of synthesized spectrograms. The style transfer decoder could be either a simple autoencoder or a conditional variational autoencoder (CVAE) that mixes timbre and text information from different utterances. The optimization goal of voice conversion model training is a hybrid loss function that combines reconstruction loss, style loss and stochastic similarity. Results show that our model speeds up and simplifies training process, and has better modularity and scalability while achieving similar performance comparing with other models.

Yihan Wang and Qiao Yan

RPF3D: Range-Pillar Feature Deep Fusion 3D Detector for Autonomous Driving

ABSTRACT. In this paper, we present RPF3D, an innovative single-stage framework that explores the complementary nature of point clouds and range images for 3D object detection. Our method addresses the sampling region imbalance issue inherent in fixed-dilation-rate convolutional layers, allowing for a more accurate representation of the input data. To enhance the model's adaptability, we introduce several attention layers that accommodate a wide range of dilation rates necessary for processing range image scenes. To tackle the challenges of feature fusion and alignment, we propose the AttentiveFusion module and the Range Image Guided Deep Fusion (RIGDF) backbone architecture in the Range-Pillar Feature Fusion section, which effectively addresses the one-pillar-to-multiple-pixels feature alignment problem caused by the point cloud encoding strategy. These innovative components work together to provide a more robust and accurate fusion of features for improved 3D object detection. We validate the effectiveness of our RPF3D framework through extensive experiments on the KITTI and Waymo Open Datasets. The results demonstrate the superior performance of our approach compared to existing methods, particularly in the Car class detection where a significant enhancement is achieved on both datasets. This showcases the practical applicability and potential impact of our proposed framework in real-world scenarios and emphasizes its relevance in the domain of 3D object detection.

Xinya Wu and Jinhua Xu

P-IoU: Accurate Motion Prediction based Data Association for Multi-Object Tracking

ABSTRACT. Multi-object tracking in complex scenarios remains a challenging task due to objects' irregular motions and indistinguishable appearances. Traditional methods often approximate the motion direction of objects solely based on their bounding box information, leading to cumulative noise and incorrect association. Furthermore, the lack of depth information in these methods can result in failed discrimination between foreground and background objects due to the perspective projection of the camera. To address these limitations, we propose a Pose Intersection over Union (P-IoU) method to predict the true motion direction of objects by incorporating body pose information, specifically the motion of the human torso. Based on P-IoU, we propose PoseTracker, a novel approach that combines bounding box IoU and P-IoU effectively during association to improve tracking performance. Exploiting the relative stability of the human torso and the confidence of keypoints, our method effectively captures the genuine motion cues, reducing identity switches caused by irregular movements. Experiments on the DanceTrack and MOT17 datasets demonstrate that the proposed PoseTracker outperforms existing methods. Our method highlights the importance of accurate motion prediction of objects for data association in MOT and provides a new perspective for addressing the challenges posed by irregular object motion.

14:00-15:30 Session ThuA3: Engineering Applications of Hybrid Artificial Intelligence Tools (Invited Session)

Chairs:

Zbigniew Gomolka and Damian Kordos

14:00-15:30 Session ThuB3: Reliable, Robust and Secure Machine Learning Algorithms (Invited Session)

14:00	Norah Almubairik and El-Sayed M. El-Alfy RF-Based Drone Detection and Identification with Deep Neural Network: Review and Case Study ABSTRACT. Drones have been widely used in many application scenarios such as logistics and on-demand instant delivery, surveillance, traffic monitoring, firefighting, photography, and recreation. On the other hand, there is a growing level of misemployment and malicious utilization of drones being reported on a local and global scale. Thus, it is essential to employ security measures to reduce these risks. Drone detection is a very crucial initial step in several tasks such as identifying, locating, tracking, and intercepting malicious drones. This paper reviews related work for drone detection and classification using deep neural networks. Moreover, it presents a case study to compare the impact of utilizing magnitude and phase spectra as input to the classifier. The findings indicate that the prediction performance is better when the magnitude spectrum is used. However, the phase spectrum can be more resilient to errors due to signal attenuation and changes in the surrounding conditions.
14:15	Yi Fan, Biao Jiang, Di Chen and Yu-Bin Yang MVCAL: Multi View Clustering for Active Learning ABSTRACT. Various active learning methods with ingenious sampling strategies have been proposed to solve the lack of labeled samples in supervised learning, but most are designed for specific tasks. In this paper, we propose a simple but task-agnostic active sampling method. We introduce "multi-view clustering module" to extract multiple feature maps at different levels for unsupervised clustering. According to clustering distribution, we calculate consistency, representativeness and stability to guide sampling and training. Our method does not depend on the specific network, and can be constructed as a two-stage sampling module to supplement the existing sampling algorithm. Experiments on image classification and object detection tasks show that our method can further enhance the effect of active learning on the basis of baseline method.
14:30	Jingjing Li, Yang Wang and Chuang Zhang Graph Pointer Network and Reinforcement Learning for Thinnest Path Problem ABSTRACT. The complexity and NP-hard nature make finding optimal solutions challenging for Combinatorial optimization problems (COPs) using traditional methods. Recently, deep learning-based approaches have shown promise in solving COPs. Pointer Network (PN) has become a popular choice due to its ability to handle variable-length sequences and generate variable-sized outputs. Graph Pointer Network (GPN), which incorporates graph embedding layers in PN, can be well-suited for problems with graph structures. Additionally, Reinforcement Learning (RL) has great potential in enhancing scalability for solving large-scale instances. In this paper, we focus on Thinnest Path Problem (TPP). We propose an approach using RL to train GPN with constraints (GPN-c) to solve TPP. Our approach outperforms traditional solutions by providing faster and more efficient solving strategies. Specifically, we achieve significant improvements in solution quality, runtime, and scalability, successfully extending our approach to instances with up to 500 nodes. Furthermore, RL and GPN can provide more flexible and adaptive solving strategies, making them highly applicable to real-world scenarios.
14:45	Tianyu Xiang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li and Zeng-Guang Hou A DNN-based Learning Framework for Continuous Movements Segmentation ABSTRACT. This study presents a novel experimental paradigm for collecting Electromyography (EMG) data from continuous movement sequences and a Deep Neural Network (DNN) learning framework for segmenting movements from these signals. Unlike prior research focusing on individual movements, this approach characterizes human motion as continuous sequences. The DNN framework comprises a segmentation module for time point level labeling of EMG data and a transfer module predicting movement transition time points. These outputs are integrated based on defined rules. Experimental results reveal an impressive capacity to accurately segment movements, evidenced by segmentation metrics (accuracy: 88.3%; Dice coefficient: 82.9%; mIoU: 72.7%). This innovative approach to time point level analysis of continuous movement sequences via EMG signals offers promising implications for future studies of human motor functions and the advancement of human-machine interaction systems.
15:00	Yi Zhang, Dongming Zhao, Bo Wang, Kun Huang, Ruifang He and Yuexian Hou User Multi-Preferences Fusion for Conversational Recommender Systems ABSTRACT. Conversational recommender systems (CRS) aim to provide recommendations by inferring user preferences during conversations.Many current CRS models utilize third-party information, such as reviews, to supplement the extraction of user preferences. Consequently, users develop preferences for third-party information and their own preferences extracted from original dialog data.However, the prevailing approach of combining these preferences as a unified whole for self-attention at the element level compromises their independence.In real-life decision-making, we refer to third-party information and it is important to distinguish whether the reference is from a third party or from the original dialog data.This paper emphasizes the independence of users' own preferences and third-party information.To effectively integrate multiple user preferences, we propose an Attentive Wide&Deep Conversational Recommender (AWDCore).Specifically, we design an attentive wide linear module and an attentive deep neural network to capture the low-order linear and high-order nonlinear relationships between the user's own preference and third-party information, respectively.To highlight the significance of the user's current preference, we incorporate attention mechanisms and a SENet layer in the wide module and deep neural networks, respectively.The learned user preferences are then employed for recommendation and dialogue generation.Extensive experiments have demonstrated the effectiveness of our approach in both recommendation and conversation tasks.
15:15	Amanda Horzyk Data Protection and Privacy: Risks and Solutions in The Contentious Era of AI-driven Ad Tech ABSTRACT. Internet Service Providers (ISPs) exponentially incorporate Artificial Intelligence (AI) algorithms and Machine Learning techniques to achieve commercial objectives of extensive data harvesting and manipulation to drive customer traffic, decrease costs, and exploit the virtual public sphere through innovative Advertisement Technology (Ad-Tech). Increasing incorporation of Generative AI to aggregate and classify collected data to generate persuasive content tailored to the behavioral patterns of users questions their information security and informational self-determinism. Significant risks arise with inappropriate information processing and analysis of big data, including personal and special data protected by influential data protection and privacy regulation worldwide. This paper bridges the inter-disciplinary gap between developers of AI applications and their socio-legal impact on democratic societies. Accordingly, the work asks how AI Behavioural Marketing poses inadequately addressed data and privacy protection risks, followed by an approach to mitigate them. The approach is developed through doctrinal research of regulatory frameworks and court decisions to establish the current legislative landscape for AI-driven Ad-Tech. The work exposes the pertinent risks triggered by algorithms by analyzing the discourse of academics, developers, and social groups. It argues that understanding the risks associated with Information Processing and Invasion is seminal to developing appropriate industry solutions through a cumulative layered approach. This work proves timely to address these contentious issues using conventional and non-conventional approaches and aspires to promote a pragmatic collaboration between developers and policymakers to address the risks throughout the AI Value Chain to safeguard individuals' data protection and privacy rights.

Chairs:

Monowar Bhuyan and Xuan-Son Vu

Location: Hefei Ting (合肥厅)

14:00	Zhuohuang Chen, Zhimin He, Yan Zhou, Patrick P.K. Chan, Fei Zhang and Haozhen Situ CAS-NN: a robust cascade neural network without compromising clean accuracy ABSTRACT. Adversarial training has emerged as a prominent approach for training robust classifiers. However, recent researches indicate that adversarial training inevitably results in a decline in a classifier’s accuracy on clean (natural) data. Robustness is at odds with clean accuracy due to the inherent tension between the objectives of adversarial robustness and standard generalization. Training a single classifier that combines high adversarial robustness and high clean accuracy appears to be an insurmountable challenge. This paper proposes a straightforward strategy to bridge the gap between robustness and clean accuracy. Inspired of the idea underlying dynamic neural networks, i.e., adaptive inference, we propose a robust cascade framework that integrates a standard classifier and a robust classifier. The cascade neural network dynamically classifies clean and adversarial samples using distinct classifiers based on the confidence score of each input sample. As deep neural networks suffer from serious overconfident problem on adversarial samples, we propose an effective confidence calibration algorithm for the standard classifier, enabling accurate confidence scores for adversarial samples. The robust classifier within the robust cascade framework is trained independently and can be combined with any state-of-the-art adversarial training algorithms. The experiments demonstrate that the proposed cascade neural network increases the clean accuracies by 10.1%, 14.67%, 9.11% compared to the advanced adversarial training (HAT) on CIFAR10, CIFAR100 and Tiny ImageNet, while keeping similar robust accuracies.
14:15	Reza Khoshkangini, Mohsen Tajgardan, Peyman Mashhadi, Thorsteinn Rognvaldsson and Daniel Tegnered Optimal Task Grouping Approach in Multitask Learning ABSTRACT. Multi-task learning has become a powerful solution in which multiple tasks are trained together to leverage the knowledge learned from one task to improve the performance of the other tasks. However, the tasks are not always constructive on each other in the multi-task formulation and might play negatively during the training process leading to poor results. Thus, this study focuses on finding the optimal group of tasks that should be trained together for multi-task learning in an automotive context. We proposed a multi-task learning approach to model multiple vehicle long-term behaviors using low-resolution data and utilized gradient descent to efficiently discover the optimal group of tasks/vehicle behaviors that can increase the performance of the predictive models in a single training process. In this study, we also quantified the contribution of individual tasks in their groups and to the other groups' performance. The experimental evaluation of the data collected from thousands of heavy-duty trucks shows that the proposed approach is promising.
14:30	Zahra Tayebi, Sarwan Ali, Prakash Chourasia, Taslim Murad and Murray Patterson T Cell Receptor Protein Sequences and Sparse Coding: A Novel Approach to Cancer Classification ABSTRACT. Cancer is a complex disease characterized by uncontrolled cell growth and proliferation, which can lead to the development of tumors and metastases. The identification of the cancer type is crucial for selecting the most appropriate treatment strategy and improving patient outcomes. T cell receptors (TCRs) are essential proteins for the adaptive immune system, and their specific recognition of antigens plays a crucial role in the immune response against diseases, including cancer. The diversity and specificity of TCRs make them ideal for targeting cancer cells, and recent advancements in sequencing technologies have enabled the comprehensive profiling of TCR repertoires. This has led to the discovery of TCRs with potent anti-cancer activity and the development of TCR-based immunotherapies. To analyze these complex biomolecules effectively, it is essential to represent them in a way that captures their structural and functional information. In this study, we investigate the use of sparse coding for the multi-class classification of TCR protein sequences with cancer categories as target labels. Sparse coding is a popular technique in machine learning that enables the representation of data with a set of informative features and can capture complex relationships between amino acids and identify subtle patterns in the sequence that might be missed by low-dimensional methods. We first compute the $k$-mers from the TCR sequences and then apply sparse coding to capture the essential features of the data. To improve the predictive performance of the final embeddings, we integrate domain knowledge regarding different types of cancer properties such as Human leukocyte antigen (HLA) types, gene mutations, clinical characteristics, immunological features, and epigenetic modifications. We then train different machine learning (linear and non-linear) classifiers on the embeddings of TCR sequences for the purpose of supervised analysis. Our proposed embedding method on a benchmark dataset of TCR sequences significantly outperforms the baselines in terms of predictive performance, achieving an accuracy of 99.8\%. Our study highlights the potential of sparse coding for the analysis of TCR protein sequences in cancer research and other related fields.
14:45	Haowen Wang, Yuliang Du, Congyun Jin, Yujiao Li, Yingbo Wang, Tao Sun, Piqi Qin and Cong Fan GACE: Learning Graph-Based Cross-Page Ads Embedding For Click-Through Rate Prediction ABSTRACT. Predicting click-through rate (CTR) is the core task of many ads online recommendation systems, which helps improve user experience and increase platform revenue. In this type of recommendation system, we often encounter two main problems: the joint usage of multi-page historical advertising data and the cold start of new ads. In this paper, we proposed GACE, a graph-based cross-page ads embedding generation method. It can warm up and generate the representation embedding of cold-start and existing ads across various pages. Specifically, we carefully build linkages and a weighted undirected graph model considering semantic and page-type attributes to guide the direction of feature fusion and generation. We designed a variational auto-encoding task as pre-training module and generated embedding representations for new and old ads based on this task. The results evaluated in the public dataset AliEC and real-world industry dataset from Alipay show that our GACE method is significantly superior to the SOTA method. In the online A/B test, the click-through rate on three real-world pages from Alipay has increased by 3.6%, 2.13%, and 3.02% respectively. Especially in the cold-start task, the CTR increased by 9.96%, 7.51%, and 8.97% respectively.
15:00	Shanshan Huang Can You Really Reason: A Novel Framework for Assessing Natural Language Reasoning Datasets and Models ABSTRACT. Recent research has revealed that numerous natural language understanding (NLU) and reasoning datasets contain statistical cues that sophisticated models can exploit, leading to an overestimation of these models’ capabilities. However, no existing work has precisely identified these cues. In this paper, we propose a lightweight framework that automatically detects potential biases in any multiple-choice NLU-related datasets and evaluates the robustness of models designed for these datasets. Furthermore, this framework has the potential to filter biased training data, enabling the training of models with improved reasoning capabilities. By addressing the issue of dataset biases, our framework contributes to the development of more robust and accurate NLU models.
15:15	Ziyi Hu and Yiping Song A Meta Learning-based Training Algorithm for Robust Dialogue Generation ABSTRACT. There are many low-resource scenarios in the field of dialogue generation, such as medical diagnosis. Many dialogue generation models in these scenarios are usually unstable due to the lack of training corpus. As one of the most popular training algorithms in recent years, meta learning has achieved remarkable results. MAML in meta learning can find a fast adaptive initialization parameter for a series of low resource tasks, which is often used to solve the problem of low resource, and has achieved excellent performance in image classification tasks. However, in the field of text generation, such as dialogue generation, because of the large vocabulary, long sequence and large number of parameters involved in text generation, the effect of MAML is unstable. Therefore, this paper proposes a high robust text generation training framework based on meta learning for the dialogue generation task. By identifying the significant information in the model parameters, the optimizer can train the important parameters more concentrated in the limited data in the bi-level optimization. Experiments show that our method has a good performance on the BLEU scores on the six single-domain datasets.

14:00-15:30 Session ThuC3: Bioinformatics 2

Chairs:

Weiping Tu and Ye Luo

14:00-15:30 Session ThuD3: Data Mining 4

14:00	Yue Han, Qiu-Hua Lin, Li-Dan Kuang, Ying-Guang Hao, Wei-Xing Li, Xiao-Feng Gong and Vince Calhoun Extraction of One Time Point Dynamic Group Features via Tucker Decomposition of Multi-Subject FMRI Data: Application to Schizophrenia ABSTRACT. Group temporal and spatial features of multi-subject fMRI data are essential for studying mental disorders, especially those exhibiting dynamic properties of brain function. Taking advantages of a low-rank Tucker model in effec-tively extracting temporally and spatially shared features of multi-subject fMRI data, we propose to extract dynamic group features via Tucker decom-position for identifying patients with schizophrenia (SZs) from healthy con-trols (HCs). We segment multi-subject fMRI data using sliding-window technique with different lengths and step size of one time point, and analyze amplitude of low frequency fluctuations and voxel features for shared time courses and shared spatial maps obtained by Tucker decomposition of seg-mented data. Results of two-sample t-tests show that HCs have higher am-plitudes of low frequency fluctuations within 0.01~0.08Hz than SZs within window length of 40s~160s, and significant HC-SZ activation differences ex-ist in such as the inferior parietal lobule and left part of auditory within 40s window, providing new evidence for analyzing schizophrenia.
14:15	Wael Korani and Hesam Akbari Detecting Depression and Alcoholism Disorders by EEG Signal ABSTRACT. The World Health Organization estimates that more than 264 and 80 million patients worldwide suffer from depression and alcoholism, respectively. Depression and alcoholism might cause severe negative repercussions on a patient's life and relationships, such as self-harm and suicide. A person can lead a normal life after these brain disorders are timely and accurately diagnosed and cured. In order to recognize the brain's activity and identify different mental disorders, Electroencephalography (EEG) is often employed. The EEG signals in our study are separated into rhythms in the empirical wavelet transform domain, and then linear and nonlinear features are extracted. Significant features are selected by a feature selection method, and the output of the feature selection method is fed into a classifier. In this paper, a fast and effective diagnostic tool is proposed to detect and recognize depression and alcoholism disorders. The proposed diagnostic tool is built on the Salp Swarm Algorithm and the Tree Growth Algorithm as feature selection methods and Cascade Forward Neural Network and Feed-forward Neural Network classifiers. The diagnostic tool is evaluated on two datasets for depression and alcoholism, and the results show that the classification accuracies are 100\% and 99.58\% for depression and alcoholism, using 10-fold cross-validation strategy, respectively. The proposed diagnostic tool can be used in hospitals and clinics for fast and accurate detection of depression and alcoholism.
14:30	Lin Liu, Xinxin Fan, Chulong Zhang, Jingjing Dai, Yaoqin Xie and Xiaokun Liang Three-Dimensional Medical Image Fusion with Deformable Cross-Attention ABSTRACT. Multimodal medical image fusion plays an instrumental role in several areas of medical image processing, particularly in disease recognition and tumor detection. Traditional fusion methods tend to process each modality independently before combining the features and reconstructing the fusion image. However, this approach often neglects the fundamental commonalities and disparities between multimodal information. Furthermore, the prevailing methodologies are largely confined to fusing two-dimensional (2D) medical image slices, leading to a lack of contextual supervision in the fusion images and subsequently, a decreased information yield for physicians relative to three-dimensional (3D) images. In this study, we introduce an innovative unsupervised feature mutual learning fusion network designed to rectify these limitations. Our approach incorporates a Deformable Cross Feature Blend (DCFB) module that facilitates the dual modalities in discerning their respective similarities and differences. We have applied our model to the fusion of 3D MRI and PET images obtained from 660 patients in the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Through the application of the DCFB module, our network generates high-quality MRI-PET fusion images. Experimental results demonstrate that our method surpasses traditional 2D image fusion methods in performance metrics such as Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM). Importantly, the capacity of our method to fuse 3D images enhances the information available to physicians and researchers, thus marking a significant step forward in the field. The code will soon be available online.
14:45	John Noel Victorino, Sozo Inoue and Tomohiro Shibata Handling Class Imbalance in Forecasting Parkinson's Disease Wearing-off with Fitness Tracker Dataset ABSTRACT. Parkinson's disease (PD) patients experience the "wearing-off phenomenon", where their symptoms resurface before they can take the following medication. As time passes, the duration of the medicine's efficacy reduces, leading to discomfort among PD patients. Therefore, patients and clinicians must meticulously observe, and document symptom changes to administer appropriate treatment. Forecasting PD wearing-off phenomenon is challenging due to the class imbalance from the difficulty in documenting the phenomenon. This paper compares different approaches for handling class imbalance in forecasting the PD wearing-off phenomenon using the fitness tracker and smartwatch dataset (wearing-off dataset): oversampling, undersampling, and combining the two. Previous studies reported the potential use of commercially-worn fitness tracker datasets to predict and forecast wearing-off periods. However, some participants' high false positives and negatives have been observed with the developed models. This paper uses and compares different approaches to handling class imbalance in the wearing-off dataset. First, changes were made during the data collection phase, as the nursing staff struggled with the data collection tool. Second, different oversampling and undersampling techniques were tried to improve the ratio of wearing-off labels to non-wearing-off instances. Finally, adjustments to forecast probabilities were applied due to the resampling in the second step.
15:00	Pengyu Zheng, Bo Li, Huilin Lai and Ye Luo BIN: A Biosignature Identification Network for Interpretable Liver Cancer Microvascular Invasion Prediction based on Multi-modal MRIs ABSTRACT. Microvascular invasion (MVI) is a critical factor that affects the postoperative cure of hepatocellular carcinoma (HCC). Precise preoperative diagnosis of MVI by magnetic resonance imaging (MRI) is crucial for effective treatment of HCC. Compared with traditional methods, deep learning-based MVI diagnostic models have shown significant improvements. However, the black-box nature of deep learning models poses a challenge to their acceptance in medical fields that demand interpretability. To address this issue, this paper proposes an interpretable deep learning model, called Biosignature Identification Network (BIN) based on multi-modal MRI images for the liver cancer MVI prediction task. Inspired by the biological ways to distinguish the species through their biosignatures, our proposed BIN method classifies patients into MVI absence (i.e., Non-MVI or negative) and MVI presence (i.e., positive) by utilizing Non-MVI and MVI biosignatures. The adoption of a transparent decision-making process in BIN ensures interpretability, while biosignatures in the model overcome the limitations associated with manual feature extraction. Moreover, a multi-modal MRI based BIN method is also explored to further enhance the diagnostic performance with an attempt to interpretability of multi-modal MRI fusion. Through extensive experiments on the real dataset, it was found that BIN maintains deep model-level performance while providing effective interpretability. Overall, the proposed model offers a promising solution to the challenge of interpreting deep learning-based MVI diagnostic models.
15:15	Nana Wang, Qian Gao and Jun Fan KSHFS: Research on Drug-Drug Interaction Prediction Based on Knowledge Subgraph and High-order Feature-aware Structure ABSTRACT. Effective drug-drug interaction (DDI) prediction can prevent adverse reactions and side effects caused by taking multiple drugs at the same time. However, most methods that obtain drug information through large-scale biomedical knowledge graphs (KGs), ignore the problem of high noise and complexity, and have certain limitations in obtaining rich neighborhood information for each entity in the KG. Therefore, this paper proposes an end-to-end method called Knowledge Subgraph and High-order Feature-aware Structure (KSHFS) to address DDI prediction. In KSHFS, this paper first designs a subgraph extraction module to reduce the noise caused by the KG, remove irrelevant information, and effectively utilize the entity information in external knowledge graphs to assist DDI prediction. Then, a high-order feature-aware module is designed to aggregate entity information propagated from high-order neighbors, learn high-order structural embeddings for each entity, and effectively capture potential semantic neighborhood features of drug pairs. Finally, in binary DDI prediction, a self-attention mechanism is used for feature fusion to predict drug interaction events. The experimental results demonstrate that the KSHFS model, which was proposed, outperforms the baseline models in binary and multi-relation DDI prediction based on various evaluation metrics, including AUC, AUPR, and F1.

Chairs:

Haiwen Du and Haitian Yang

14:00-15:30 Session ThuE3: Deep Learning 1

14:00	Haitian Yang, Degang Sun and Yanshu Li ASGNet: Adaptive Semantic Gate Networks for Log-Based Anomaly Diagnosis ABSTRACT. Logs are widely used in the development and maintenance of software systems. Logs can help engineers understand the runtime behavior of systems and diagnose system failures. For anomaly diagnosis, existing methods generally use log event data extracted from historical logs to build diagnostic models. However, we find that existing methods do not make full use of two types of features, (1) statistical features: some inherent statistical features in log data, such as word frequency and abnormal label distribution, are not well exploited. Compared with log raw data, statistical features are deterministic and naturally compatible with corresponding tasks. (2) semantic features: Logs contain the execution logic behind software systems, thus log statements share deep semantic relationships. How to effectively combine statistical features and semantic features in log data to improve the performance of log anomaly diagnosis is the key point of this paper. In this paper, we propose an adaptive semantic gate networks (ASGNet) that combines statistical features and semantic features to selectively use statistical features to consolidate log text semantic representation. Specifically, ASGNet encodes statistical features via a variational encoding module and fuses useful information through a well-designed adaptive semantic threshold mechanism. The threshold mechanism introduces the information flow into the classifier based on the confidence of the semantic features in the decision, which is conducive to training a robust classifier and can solve the overfitting problem caused by the use of statistical features. The experimental results on the real data set show that our method proposed is superior to all baseline methods in terms of various performance indicators.
14:15	Haiwen Du, Zheng Ju, Honghui Du, Yu An, Dongjie Zhu, Zhaoshuo Tian, Aonghus Lawlor and Ruihai Dong Can We Transfer Noise Patterns? A Multi-environment Spectrum Analysis Model Using Generated Cases ABSTRACT. Spectrum analysis systems in online water quality testing are designed to detect types and concentrations of pollutants and enable regulatory agencies to respond promptly to pollution incidents. However, spectral data-based testing devices suffer from complex noise patterns when deployed in non-laboratory environments. To make the analysis model applicable to more environments, we propose a noise patterns transferring model, which takes the spectrum of standard water samples in different environments as cases and learns the differences in their noise patterns, thus enabling noise patterns to transfer to unknown samples. Unfortunately, the inevitable sample-level baseline noise makes the model unable to obtain the paired data that only differ in dataset-level environmental noise. To address the problem, we generate a sample-to-sample case-base to exclude the interference of sample-level noise on dataset-level noise learning, enhancing the system's learning performance. Experiments on spectral data with different background noises demonstrate the good noise-transferring ability of the proposed method against baseline systems ranging from wavelet denoising, deep neural networks, and generative models. From this research, we posit that our method can enhance the performance of DL models by generating high-quality cases. The source code is made publicly available online at https://github.com/Magnomic/CNST.
14:30	Fang Yu, Shijun Li and Wei Yu Exploring Non-Isometric Alignment Inference for Representation Learning of Irregular Sequences ABSTRACT. The development of Internet of Things (IoT) technology has led to increasingly diverse and complex data collection methods. This unstable sampling environment has resulted in the generation of a large number of irregular monitoring data streams, posing significant chal-lenges for related data analysis tasks. Current approaches mainly focus on the uncertainty of data representations caused by the local non-isometricity in sequences. Previous works often design specific embedding structures tailored to specific tasks to make models adapt to the non-isometric nature of sequences and mitigate these negative effects. However, we have observed that irregular sequence sampling densities are uneven, containing randomly occurring dense and sparse intervals. This data imbalance tendency often leads to overfit-ting in the dense regions and underfitting in the sparse regions, ultimately impeding the representation performance of models. Conversely, the irregularity at the data level has limited impact on the deep semantics of sequences. Based on this observation, we propose a novel Non-isometric Alignment Inference Architecture (NAIA), which utilizes a multi-level semantic continuous representation structure based on inter-interval segmentation to learn representations of irregular sequences. This architecture efficiently extracts the latent features of irregular sequences. We evaluate the performance of NAIA on multiple datasets for downstream tasks and compare it with recent benchmark methods, demonstrating NAIA's state-of-the-art performance results.
14:45	Shilun Ma, Wei Kang, YiMin Wen and Yun Xue CPSSDS-R:Data stream semi-supervised classification algorithm based on conformal prediction ABSTRACT. In this article, we consider the problem of semi-supervised data stream classification. The main challenges of data stream semisupervised classification include fast arrival of samples, limited labeled data, and handling concept drift. Existing algorithms detect concept drift, which causes the classifier to be constantly reinitialized, It is very consuming and wasteful of space resources.Therefore, a concept drift data stream semi-supervised classification algorithm CPSSDS-R based on model reuse is proposed. First, the labeled sample set in the data block is used to initialize the classification model. Secondly, After detecting concept drift during the data iteration process, the model and corresponding conformal prediction outputs of unlabeled samples are added to the classifier pool, and a new model is reconstructed. Then, the component classifiers in the classifier pool that are similar to the conformed prediction output of the current data block are detected, and the recurring concept is detected based on a distribution-based method. Finally, the classifier and incremental update model are updated based on the concept drift detection results. The algorithm is experimentally tested on multiple synthetic and real datasets, and its cumulative accuracy and block accuracy at different labeling ratios demonstrate the effectiveness of the algorithm in detecting concept drift.
15:00	Lin Li, Haohan Zhang, Zeqin Fang, Zhongwei Xie and Jianquan Liu Transductive Cross-Lingual Scene-Text Visual Question Answering ABSTRACT. Multilingual modeling has gained increasing attention in recent years, as the cross-lingual Text-based Visual Question Answering (TextVQA) are requried to understand questions and answers across different languages. Current researches mainly work on multimodal information assuming that multilingual pretrained models are effective to encode questions. However, the semantic comprehension of a textbased question varies between languages, creating challenges in directly deducing its answer from an image. To this end, we propose a novel multilingual text-based VQA framework suited for cross-language scenarios(CLVQA), transductively considering multiple answer generating interactions with questions. First, a question reading module densely connects encoding layers in a feedforward manner, which can adaptively work together with answering. Second, a multimodal OCR-based module decouples OCR features in an image into visual, linguistic, and holistic parts to facilitate the localization of a target-language answer. By incorporating enhancements from the above two input encoding modules, the proposed framework outputs its answer candidates mainly from the input image with a object detection module. Finally, a transductive answering module jointly understands input multimodal information and identified answer candidates at the multilingual level, autoregressively generating cross-lingual answers. Extensive experiments show that our framework outperforms state-of-the-art methods for both of crosslingual (English<->Chinese) and mono-lingual (English<->English and Chinese<->Chinese) tasks in terms of accuracy based metrics. Moreover, significant improvements are achieved in zero-shot cross-lingual settings(French<->Chinese).
15:15	Jiyi Li Learning Representations for Sparse Crowd Answers ABSTRACT. When collecting answers from crowds, if there are many instances, each worker can only provide the answers to a small subset of the instances, and the instance-worker answer matrix is thus sparse. The solutions for improving the quality of crowd answers such as answer aggregation are usually proposed in an unsupervised fashion. In this paper, for enhancing the quality of crowd answers used for inferring true answers, we propose a solution with a self-supervised fashion to effectively learn the potential information in the sparse crowd answers. We propose a method named \textsc{CrowdLR} which first learns rich instance and worker representations from the crowd answers based on two types of self-supervised signals. We create a multi-task model with a Siamese structure to learn two classification tasks for two self-supervised signals in one framework. We then utilize the learned representations to complete the answers to fill the missing answers, and can utilize the answer aggregation methods to the complete answers. The experimental results based on real datasets show that our approach can effectively learn the representations from crowd answers and improve the performance of answer aggregation especially when the crowd answers are sparse.

Chairs:

Weiqiang Jin and Xiaohan Zhou

Location: Fuzhou Ting (福州厅)

14:00	Weiqiang Jin, Biao Zhao and Guizhong Liu Exploring the Capability of ChatGPT for Cross-Linguistic Agricultural Document Classification: Investigation and Evaluation ABSTRACT. In the sustainable smart agriculture era, a vast amount of agricultural knowledge is available on the internet, making it necessary to explore effective document classification techniques for enhanced accessibility and efficiency. Over the past few years, fine-tuning strategies based on pre-trained language models (PLMs) have gained popularity as mainstream deep learning approaches, showcasing impressive performance. However, these approaches face several challenges, including a limited availability of training data, poor domain transferability, lack of model interpretability, and the challenges in deploying large models. In spired by ChatGPT’s significant success, we investigate its capability and utilization in the field of agricultural information processing. We explore various attempts to maximize ChatGPT’s potential, including various prompting construction strategies, ChatGPT question-answering (Q&A) inference, and intermediate answer alignment technique. Our preliminary comparative study demonstrates that ChatGPT effectively addresses research challenges and bottlenecks, positioning it as an ideal solution for agricultural document classification. This findings encourage the development of a general-purpose agricultural document processing paradigm. Our preliminary study also indicates the trend towards achieving Artificial General Intelligence (AGI) for sustainable smart agriculture in the future. Code is available on Github:https://github.com/albert-jin/agricultural_textual_classification_ChatGPT.
14:15	Xinmeng Xu, Yiqun Zhang, Yuhong Yang and Weiping Tu Leveraging Sound Local and Global Features for Language-Queried Target Sound Extraction ABSTRACT. Language-queried target sound extraction is a fundamental audio-language task that aims to estimate the audio signal of the target sound event class by a natural language expression in a sound mixture. One of the key challenges of this task is leveraging the language expression to highlight the target sound features in the noisy mixture interpretably. In this paper, we leverage language expression to guide the model to extract the most informative features of the target sound event by adaptively using local and global features, and we present a novel language-aware synergic attention network (LASA-Net) for language-queried target sound extraction, as the first attempt to leverage local and global operations using language representation to extract target sound in single or multiple sound source environments. In particular, language-aware synergic attention consists of a local operation submodule, a global operation submodule, and an interaction submodule, in which local and global operation submodules extract sound local and global features while the interaction submodule adaptively selects the most discriminative features with the guidance of linguistic features. In addition, we introduce a linguistic-acoustic fusion module that leverages the well-proven correlation modeling power of self-attention for excavating helpful multi-modal contexts. Extensive experiments demonstrate that our proposed LASA-Net is able to achieve state-of-the-art performance while maintaining an attractive computational complexity.
14:30	Tianqi Wu, Liejun Wang and Jiang Zhang CM-TCN: Channel-aware Multi-scale Temporal Convolutional Networks For Speech Emotion Recognition ABSTRACT. Speech emotion recognition (SER) plays a crucial role in understanding user intent and improving human-computer interaction (HCI). Currently, the most widely used and effective methods are based on deep learning. In the existing research, the temporal information becomes more and more important in SER. Although some advanced deep learning methods can achieve good results, such as convolutional neural networks (CNN) and attention module, they often ignore the temporal information in speech, which can lead to insufficient representation and low classification accuracy. In order to make full use of temporal features, we proposed channel-aware multi-scale temporal convolutional networks (CM-TCN). Firstly, channel-aware temporal convolutional networks (CATCN) is used as the basic structure to extract multi-scale temporal features combining channel information. Then, global feature attention (GFA) captures the global information at different time scales and enhances the important information. Finally, we use the adaptive fusion module (AFM) to establish the overall dependency of different network layers and fuse features. We conduct extensive experiments on six corpora, and the experimental results demonstrate the superior performance of CM-TCN.
14:45	Yiquan Jiang, Kengte Liao, Shoude Lin, Hongming Qiao, Kefeng Yu, Chengwei Yang and Yinqi Chen Self-Supervised Multimodal Representation Learning for Product Identification and Retrieval ABSTRACT. Solving object similarity remains a persistent challenge in the field of data science. In the context of e-commerce retail, the identification of substitutable and similar products involves similarity measures. Leveraging the multimodal learning derived from real-world experiences, humans are capable of recognizing similar products based solely on their titles, even in cases where significant literal differences exist. Motivated by this intuition, we propose a self-supervised mechanism that extracts strong prior knowledge from product images. This mechanism serves to enhance the encoder's capacity for learning product representations in a multimodal framework. The similarity between products can be reflected by the distance between their respective representations. Additionally, we introduce a novel attention regularization to effectively directs attention towards product category-related signals. The proposed model exhibits wide applicability as it can be effectively employed in unimodal tasks where only free-text inputs are available. To validate our approach, we evaluate our model on two key tasks: product similarity matching and retrieval. These evaluations are conducted on a real-world dataset consisting of thousands of diverse products. Experimental results demonstrate that multimodal learning significantly enhances the language understanding capabilities within the e-commerce domain. Moreover, our approach outperforms strong unimodal baselines and recently proposed multimodal methods, further validating its superiority.
15:00	Xiaohan Zhou, Yuzhe Liu, Wei Sun and Qiang Yu Time-warp-invariant Processing with Multi-spike Learning ABSTRACT. Sensory signals are encoded and processed by neurons in the brain in a form of action potentials, also called spikes that carry clue information across both spatial and temporal dimensions. Learning of such a clue information could be challenging, especially considering the case of long-delayed reward. This temporal credit assignment problem has been solved by a new concept of aggregate-label learning that motivates the development of a family of multi-spike learning algorithms whose remarkable learning performance has been demonstrated. However, most of the current spike-based learning methods are developed without consideration of input temporal fluctuations that constitute a common source of variability in sensory signals such as speech. Therefore, robust spike-based learning under fluctuations of both compression and dilation remains intriguing for exploration. In this paper, we first show the time-warp invariant characteristic of a conductance-based neuron model, based on which we then develop a new multi-spike learning algorithm for time-warp-invariant processing. Experimental results for speech recognition highlight the outstanding robustness of our algorithm against temporal distortions as compared with other relevant spike-based methods. Therefore, our study successfully confirms the effectiveness of multi-spike learning for time-warp robustness, extending a new scope for spike-based processing and learning.
15:15	Xinlei Huang, Ning Jiang, Jialiang Tang and Wenqing Wu Dynamic Feature Distillation ABSTRACT. Feature-based knowledge distillation utilizes features from superior and complex teacher networks as knowledge to help portable student networks improve their generalization capability. Recent feature distillation algorithms focus on various feature processing and transmission methods while ignoring the flexibility of feature selection, resulting in limited distillation effects for students. In this paper, we propose Dynamic Feature Distillation to increase the flexibility of feature distillation by dynamically managing feature transfer sites. Our method leverages Online Feature Estimation to monitor the learning status of the student network in the feature dimension. Adaptive Position Selection then dynamically updates valuable feature transmission locations for efficient feature transmission. Notably, our approach can be easily integrated as a strategy for feature management into other feature-based knowledge transfer methods to improve their performance. We conduct extensive experiments on the CIFAR-100 and Tiny-ImageNet datasets to validate the effectiveness of Dynamic Feature Distillation.

14:00-15:30 Session ThuF3: Pattern Recognition 2

Chairs:

Songwei Pei and Ziqiang Li

14:00-15:30 Session ThuG3: Image Processing & Computer Vision 4

14:00	Wenhua Liao and Songwei Pei NMPose: Leveraging Normal Maps for 6D Pose Estimation ABSTRACT. Estimating the 6 degrees-of-freedom (6DoF) pose of an object from a single image is an important task in computer vision. Many recent works have addressed it by establishing 2D-3D correspondences and then applying a variant of the PnP algorithm. However, it is extraordinarily difficult to establish accurate 2D-3D correspondences for 6D pose estimation. In this work, we consider 6D pose estimation as a follow-up task to normal estimation so that pose estimation can benefit from the advance of normal estimation. We propose a novel 6D object pose estimation method, in which normal maps rather than 2D-3D correspondences are leveraged as alternative intermediate representations. In this paper, we illustrate the advantages of using normal maps for 6D pose estimation and also demonstrate that the estimated normal maps can be easily embedded into common pose recovery methods. On LINEMOD and LINEMOD-O, our method easily surpasses the baseline method and outperforms or rivals the state-of-the-art correspondence-based methods on common metrics. Our code is made publicly available.
14:15	Ziqiang Li, Kantaro Fujiwara and Gouhei Tanaka Dynamical Graph Echo State Networks with Snapshot Merging for Spreading Process Classification ABSTRACT. The Dissemination Process Classification (DPC) is a popular application of temporal graph classification. The aim of DPC is to classify different spreading patterns of information or pestilence within a community represented by discrete-time temporal graphs. Recently, a reservoir computing-based model named Dynamical Graph Echo State Network (DynGESN) has been proposed for processing temporal graphs with relatively high effectiveness and low computational costs. In this study, we propose a novel model which combines a novel data augmentation strategy called snapshot merging with the DynGESN for dealing with DPC tasks. In our model, the snapshot merging strategy is designed for forming new snapshots by merging neighboring snapshots over time, and then multiple reservoir encoders are set for capturing spatiotemporal features from merged snapshots. After those, the logistic regression is adopted for decoding the sum-pooled embeddings into the classification results. Experimental results on six benchmark DPC datasets show that our proposed model has better classification performances than the DynGESN and several kernel-based models.
14:30	Yun-Hao Yuan, Pengwei Qian, Jin Li, Jipeng Qiang, Yi Zhu and Yun Li Many Is Better than One: Multiple Covariation Learning for Latent Multiview Representation ABSTRACT. As one of the most compelling methods in multiview representation learning (MRL), canonical correlation analysis (CCA) and its variants have been widely applied in many fields. Due to the intrinsic linearity of covariance matrices, CCA can hardly reveal nonlinear relationships among features. Over the past few decades, many variants of CCA have been developed to discover nonlinear relationships. However, the complexity and variety of relationships between features in practical applications, and the difficulty of representing them with ordinary nonlinear relationships, limit the representative capacity of these methods. To overcome this problem, we propose a multiple covariation projection (MCP) method, which can model the composite relations to learn informative and compact multiview representation. Moreover, a multiset extension of MCP, dubbed MMCP, is developed to handle more than two views simultaneously. Extensive experimental results on five multiview datasets illustrate the effectiveness of our methods in multiview tasks such as classification and clustering.
14:45	Aaron Afewerki Hailu, Abiel Tesfamichael Hayleslassie, Danait Weldu Gebresilasie, Robel Estifanos Haile, Tesfana Tekeste Ghebremedhin and Yemane Keleta Tedla Tigrinya OCR: Applying CRNN for Text Recognition ABSTRACT. Tigrinya, a language predominantly spoken in Eritrea and the Tigray region of Ethi- opia, is classified as a low resourced language when it comes to Natural Language Processing (NLP) and its documents are not widely accessible due to the lack of printed material. Although the language has a rich cultural heritage, its literature hasn’t been exposed to large-scale auto- mated digitization when compared to other widely-spoken languages. In this paper, we design an end-to-end CRNN (Convolutional Recurrent Neural Network) to recognize machine-printed Tigrinya text from document images. This will help Tigrinya documents to be more accessible and also bridge the gap with other languages rich in NLP resources. We have included all the 304 characters in Tigrinya and the network is trained on a total of over a million text-line imag- es constructed from different domains. The majority of the data was synthesized to augment the limited real da-ta to help the model generalize better. We employed two external datasets (ADOCR and GLOCR) in addition to ours to train the network. Furthermore, to improve the performance of the model, extensive parameter tuning was conduct-ed. Without the use of post processing techniques, the model has achieved a 2.32% Character Error Rate (CER). The learn- ing curve shows that given more data, the model can improve the CER. We finally managed to get a lightweight model that achieves comparable results to state-of-the-art results. This result implies that augmenting low resource data with synthetic data can significantly reduce the error rate in text recognition and also that proper hyperparameter tuning can find us lightweight models without compromising much accuracy.
15:00	Zipeng Wu, Chu Kiong Loo and Kitsuchart Pasupa Correlated Online k-Nearest Neighbors Regressor Chain for Online Multi-Output Regression ABSTRACT. Online multi-output regression is a crucial task in machine learning with applications in various domains such as environmental monitoring, energy efficiency prediction, and water quality prediction. This paper introduces CONNRC, a novel algorithm specifically designed to address the challenges of online multi-output regression and provide accurate real-time predictions. CONNRC builds upon the k-nearest neighbor algorithm in an online manner and incorporates a relevant chain structure to effectively capture and utilize correlations among structured multi-outputs. The main contribution of this work lies in the potential of CONNRC to enhance the accuracy and efficiency of real-time predictions across diverse application domains. Through a comprehensive experimental evaluation on six real-world datasets, CONNRC is compared against five existing online regression algorithms. The consistent results highlight that CONNRC consistently outperforms the other algorithms in terms of average Mean Absolute Error, demonstrating its superior accuracy in multi-output regression tasks. However, the time performance of CONNRC requires further improvement, indicating an area for future research and optimization.
15:15	Yuxiao Deng, Chuandong Li, Yawei Shi, Huiwei Wang and Huaqing Li An improved target searching and imaging method for CSAR ABSTRACT. Circular Synthetic Aperture Radar (CSAR) has attracted much attention in the field of high-resolution SAR imaging. In order to shorten the computation time and improve the imaging effect, in this paper, we propose a fast CSAR imaging strategy that searches the target and automatically selects the area of interest for imaging. The first step is to find the target and select the imaging center and interest imaging area based on the target search algorithm, the second step is to divide the full-aperture data into sub-apertures according to the angle, the third step is to approximate the sub-apertures as linear arrays and imaging them separately, and the last step is to perform sub-image fusion to obtain the final CSAR image. This method can greatly reduce the imaging time and obtain well-focused CSAR images. The proposed algorithm is verified by both simulation and processing real data collected with our mmWave imager prototype utilizing commercially available 77-GHz MIMO radar sensors. Through the experimental results we verified the performance and the superiority of the our algorithm.

Chairs:

Chongyang Shi and Hao Sheng

16:00-18:00 Session ThuA4: Natural Language Processing

14:00	Xiangrui Su, Qi Zhang, Chongyang Shi, Jiachang Liu and Liang Hu Syntax Tree Constrained Graph Network for Visual Question Answering ABSTRACT. Visual Question Answering (VQA) aims to automatically answer natural language questions related to given image content. Existing VQA methods integrate vision modeling and language understanding to explore the deep semantics of the question. However, these methods ignore the significant syntax information of the question, which plays a vital role in understanding the essential semantics of the question and guiding the visual feature refinement. To fill the gap, we suggested a novel Syntax Tree Constrained Graph Network (STCGN) for VQA based on entity message passing and syntax tree. This model is able to extract a syntax tree from questions and obtain more precise syntax information. Specifically, we parse questions and obtain the question syntax tree using the Stanford syntax parsing tool. From the word level and phrase level, syntactic phrase features and question features are extracted using a hierarchical tree convolutional network. We then design a message-passing mechanism for phrase-aware visual entities and capture entity features according to a given visual context. Extensive experiments on VQA2.0 datasets demonstrate the superiority of our proposed model.
14:15	Guoqing Shangguan, Hao Xiong, Dong Liu and Hualei Shen Fast and Efficient Brain Extraction with Recursive MLP based 3D UNet ABSTRACT. Extracting brain from other non-brain tissues is an essential step in neuroimage analyses such as brain volume estimation. The transformers and 3D UNet based methods achieve strong performance using attention and 3D convolutions. They normally have complex architecture and are thus computationally slow. Consequently, they can hardly be deployed in computational resource-constrained environments like small neuroimage analysis clinics. To achieve rapid segmentation, the most recent work UNeXt reduces convolution filters and also presents the Multilayer Perception (MLP) blocks that exploit simpler and linear MLP operations. To further boost performance, it shifts the feature channels in MLP block so as to focus on learning local dependencies. However, it performs segmentation on 2D medical images rather than 3D volumes. In this paper, we propose a recursive MLP based 3D UNet to efficiently extract brain from 3D head volume. Our network involves 3D convolution blocks and MLP blocks to capture both long range information and local dependencies. Meanwhile, we also leverage the simplicity of MLPs to enhance computational efficiency. Unlike UNeXt extracting one locality, we apply several shifts to capture multiple localities representing different local dependencies and then introduce a recursive design to aggregate them. To save computational cost, the shifts do not introduce any parameters and the parameters are also shared across recursions. Extensive experiments on two public datasets demonstrate the superiority of our approach against other state-of-the-art methods with respect to both accuracy and CPU inference time.
14:30	Da Yang, Hao Sheng, Shuai Wang, Rongshan Chen and Zhang Xiong Direct Inter-Intra View Association for Light Field Super-Resolution ABSTRACT. Light field (LF) cameras record both intensity and directions of light rays in a scene with a single exposure. However, due to the inevitable trade-off between spatial and angular dimensions, the spatial resolution of LF images is limited which makes LF super-resolution (LFSR) a research hotspot. The key of LFSR is the complementation across views and the extraction of high-frequency information inside each view. Due to the high-dimensinality of LF data, previous methods usually model these two processes separately, which results in insufficient inter-view information fusion. In this paper, LF Transformer is proposed for comprehensive perception of 4D LF data. Necessary inter-intra view correlations can be directly established inside each LF Transformer block. Therefore it can handle complex disparity variations of LF. Then based on LF Transformers, 4DTNet is designed which comprehensively performs inter-intra view high-frequency information extraction. Extensive experiments on public datasets demonstrate that 4DTNet outperforms the current state-of-the-art methods both numerically and visually.
14:45	Xinyue Xu, Yuhan Hsi, Haonan Wang and Xiaomeng Li Dynamic Data Augmentation via Monte-Carlo Tree Search for Prostate MRI Segmentation ABSTRACT. Medical image data are often limited due to the expensive acquisition and annotation process. Hence, training a deep-learning model with only raw data can easily lead to overfitting. One solution to this problem is to augment the raw data with various transformations, improving the model's ability to generalize to new data. However, manually configuring a generic augmentation combination and parameters for different datasets is non-trivial due to inconsistent acquisition approaches and data distributions. Therefore, automatic data augmentation is proposed to learn favorable augmentation strategies for different datasets while incurring large GPU overhead. To this end, we present a novel method, called Dynamic Data Augmentation (DDAug), which is efficient and has negligible computation cost. Our DDAug develops a hierarchical tree structure to represent various augmentations and utilizes an efficient Monte-Carlo tree searching algorithm to update, prune, and sample the tree. As a result, the augmentation pipeline can be optimized for each dataset automatically. Experiments on multiple Prostate MRI datasets show that our method outperforms the current state-of-the-art data augmentation strategies.
15:00	Pranab Sahoo, Saksham Kumar Sharma, Sriparna Saha and Samrat Mondal A Federated Multi-Stage Light-Weight Vision Transformer for Respiratory Disease Detection ABSTRACT. Artificial Intelligence (AI)-based computer-aided diagnosis (CAD) has been widely applied to assist medical professionals in several medical applications. Although there are many studies on respiratory disease detection using Deep Learning (DL) approaches from radiographic images, the limited availability of public datasets limits their interpretation and generalization capacity. However, radiography images are available through different organizations in various countries. This condition is suited for Federated Learning (FL) training, which can collaborate with different institutes to use private data and train a global model. In FL, the local model on the client's end is critical because there must be a balance between the model's accuracy, communication cost, and client-side memory usage. The current DL or Vision Transformer (ViT)-based models have large parameters, making the client-side memory and communication costs a significant bottleneck when applied to FL training. The existing state-of-the-art (SOTA) FL techniques on respiratory disease detection either use small CNNs with insufficient accuracy or assume clients have sufficient processing capacity to train large models, which remains a significant challenge in practical applications. In this study, we tried to find one question, is it possible to maintain higher accuracy while lowering the model parameters, leading to lower memory requirements and communication costs? To address this problem, we propose a federated multi-stage light-weight ViT framework that combines the strengths of CNNs and ViTs to build an efficient FL framework. We conduct extensive experiments and show that the proposed framework outperforms a set of current SOTA models in FL training with higher accuracy while lowering communication costs and memory requirements. We adapted Grad-CAM for the infection localization and compared it with an experienced radiologist's findings. Upon acceptance, our code and used dataset will be available on our GitHub account.
15:15	Qingping Sun, Yanjun Wang, Zhenni Wang and Chi Sing Leung Learning Dense UV Completion for 3D Human Mesh Recovery ABSTRACT. Human mesh reconstruction from a single image is a challenging task due to the occlusion caused by self, objects, or other humans. Existing methods either fail to separate human features accurately or lack proper supervision for feature completion. In this paper, we propose Dense Inpainting Human Mesh Recovery (DIMR), a two-stage method that leverages dense correspondence maps to handle occlusion. Our method utilizes a dense correspondence map to separate visible human features and completes human features on a structured UV space with an attention-based feature completion module. We also design a feature inpainting training procedure that guides the network to learn from unoccluded features. We evaluate our method on several datasets and demonstrate its superior performance under heavily occluded scenarios compared to other methods. Extensive experiments show that our method obviously outperforms prior SOTA methods on heavily occluded images and achieves comparable results on the standard benchmarks (3DPW). Moreover, our method is comparable with previous methods on no heavily occluded images.

Chairs:

Xinran Chen and Rong Yan

16:00-18:00 Session ThuB4: Neural Network Models

16:00	Yunhui Pan, Dongyao Li, Zhouhao Dai and Peng Cui Aspect-level sentiment analysis using dual probability graph convolutional networks (DP-GCN) integrating multi-scale information ABSTRACT. Aspect-based sentiment analysis (ABSA) is a fine-grained entity-level senti-ment analysis task that aims to identify the emotions associated with specific aspects or details within text. ABSA has been widely applied to various areas such as analyzing product reviews and monitoring public opinion on social media. In recent years, methods based on graph neural networks combined with syntactic information have achieved promising results in the task of ABSA. However, existing methods using syntactic dependency trees contain redundant information, and the relationships with identical weights do not re-flect the importance of the aspect words and opinion words' dependencies. Moreover, ABSA is limited by issues such as short sentence length and in-formal expression. Therefore, this paper proposes a Double Probabilistic Graph Convolutional Network (DP-GCN) integrating multi-scale information to address the aforementioned issues. Firstly, the original dependency tree is reshaped through pruning, creating aspect-based syntactic dependency tree corresponding syntactic dependency weights. Next, two probability attention matrixes are constructed based on both semantic and syntactic information. The semantic probability attention matrix represents the weighted directed graph of semantic correlations between words. Compared with the discrete adjacency matrix directly constructed by the syntax dependency tree, the probability matrix representing the dependency relationship between words based on syntax information contains rich syntactic information. Based on this, semantic information and syntactic dependency information are sepa-rately extracted via graph convolutional networks. Interactive attention is used to guide mutual learning between semantic information and syntactic dependency information, enabling full interaction and fusion of both types of information before finally carrying out sentiment polarity classification. Our model was tested on four public datasets, Restaurant, Laptop, Twitter and MAMS. The accuracy (ACC) and F1 score improved by 0.14% to 1.26% and 0.4% to 2.19%, respectively, indicating its outstanding performance.
16:15	Xingjin Wang, Linjing Li and Daniel Zeng Staged Long Text Generation with Progressive Task-Oriented Prompts ABSTRACT. Generating coherent and consistent long text remains a challenge for artificial intelligence. The state-of-the-art paradigm partitions the whole generating process into successive stages, however, the content plan applied in each stage may be error-prone and fine tuning large-scale language models, one for each stage, is resource-consuming. In this paper, we follow the above paradigm and devise three stages: keyphrase decompression, transition paraphrase,and text generation. We leverage task-oriented prompts to direct the producing of text in each stage which improves the quality of the generated text. Further, we propose a new content plan representation with elastic mask tokens to reduce model bias and irregular words. Moreover, we introduce length control and commonsense knowledge prompts to increase the adaptability of the proposed model. Extensive experiments conducted on two challenging tasks demonstrated that our model outperforms strong baselines significantly, and it is able to generate longer high quality texts with fewer parameters.
16:30	Xiliang Zhang, Tong Zhang and Rong Yan Chinese Medical Intent Recognition Based on Multi-feature Fusion ABSTRACT. Popularity of online query services heighten the need for suitable methods to accurately understand the truth of query intention. Currently, most of the medical query intention recognition methods are deep learning-based. However, for the inadequate of corpus of the medical field in the pre-trained phase, these methods fail to accurately extract the text feature constructed by medical domain knowledge, as well as it could not fully capture the query intention for relying on a single technology to extract the text information. To mitigate these issues, we propose a novel intent recognition model called EDCGA (ERNIE-Health+D-CNN+Bi-GRU+Attention) in this paper. EDCGA achieves text representation using the word vectors of the pre-trained ERNIE-Health model and employs D-CNN to expand the receptive field for extracting local information features. Furthermore, it combines Bi-GRU and attention mechanism to extract global information to enhance the understanding of the intent. Extensive experimental results on multiple datasets demonstrate that our proposed model exhibits superior recognition performance compared to the baselines.
16:45	Xinran Chen, Yuran Zhao, Jianming Guo, Sufeng Duan and Gongshen Liu SDPSAT: Syntactic Dependency Parsing Structure-guided Semi-Autoregressive Machine Translation ABSTRACT. The advent of non-autoregressive machine translation (NAT) improves the decoding speed of autoregressive machine translation (AT) greatly, while bringing about a performance decrease. Semi-autoregressive neural machine translation (SAT), as a compromise, enjoys the advantages of autoregressive and non-autoregressive decoding. However, current SAT methods face the challenges of information-limited initialization and rigorous termination. This paper develops a layer-and-length-based syntactic labeling method and introduces a syntactic dependency parsing structure-guided two-stage semi-autoregressive translation (SDPSAT) model, which addresses the above challenges with a syntax-based initialization and termination. Additionally, we also present a Mixed Training strategy to shrink exposure bias. Experiments on six widely used datasets show that our SDPSAT model outperforms traditional SAT models with reduced word repetition and achieves competitive results with the AT baseline at a 2× ∼ 3× speedup.
17:00	Kedong Wang, Qiang Zhu and Fang Kong Topic-aware Two-layer Context-enhanced Model for Chinese Discourse Parsing ABSTRACT. In the past decade, Chinese Discourse Parsing has drawn much attention due to its fundamental role in document-level Natural Language Processing(NLP). In this work, we propose a topic-aware two-layer context-enhanced model based on transition system. Specifically, in one hand, we first adopt a two-layer context-enhanced Chinese discourse parser as a strong baseline, where the Star-Transformer with star topology is employed to enhance the EDU representation. On the other hand, we split the document into multiple sub-topics based on the change of nuclearity of discourse relations. Then we implicitly incorporate topic boundary information via joint learning framework.
17:15	Shengjie Jia, Jikun Dong, Kaifang Long, Jiran Zhu, Hongyun Du, Guijuan Zhang, Hui Yu and Weizhi Xu Recurrent Update Representation based on Multi-Head Attention Mechanism for Joint Entity and Relation Extraction ABSTRACT. Joint extraction of entities and relations from unstructured text is an important task in information extraction and knowledge graph construction. However, most of the existing work only considers the information of the context in the sentence and the information of the entities, with little attention to the information of the possible relations between the entities, which may lead to the failure to extract valid triplets. In this paper, we propose a recurrent update representational method based on multi-head attention mechanism for relation extraction. We use a multi-head attention mechanism to interact the information between the relational representation and the sentence context representation, and make the feature information of both fully integrated by cyclically updating the representation. The model performs relation extraction after the representation is updated. Using this approach we are able to leverage the relationship information between entities for relational triple extraction. Our experimental results on four public datasets show that our approach is effective and the model outperforms all baseline models.
17:30	Shang Gao, Rina Sa, Yanling Li, Fengpei Ge, Haiqing Yu, Sukun Wang and Zhongyi Miao How Legal Knowledge Graph Can Help Predict Charges for Legal Text ABSTRACT. The existing methods for predicting Easily Confused Charges (ECC) primar-ily rely on factual descriptions from legal cases. However, these approaches overlook some key information hidden in these descriptions, resulting in an inability to accurately differentiate between ECC. Legal domain knowledge graphs can showcase personal information and criminal pro-cesses in cases, but they primarily focus on entities in cases of insolation while ignoring the logical relationships between these entities. Different re-lationships often lead to distinct charges. To address these problems, this paper proposes a charge prediction model that integrates a Criminal Behav-ior Knowledge Graph (CBKG), called Charge Prediction Knowledge Graph (CP-KG). Firstly, we defined a diverse range of legal entities and relation-ships based on the characteristics of ECC. We conducted fine-grained anno-tation on key elements and logical relationships in the factual descriptions. Subsequently, we matched the descriptions with the CBKG to extract the key elements, which were then encoded by Text Convolutional Neural Network (TextCNN). Additionally, we extracted case subgraphs containing sequential behaviors from the CBKG based on the factual descriptions and encoded them using a Graph Attention Network (GAT). Finally, we concat-enated these representations of key elements, case subgraphs, and factual descriptions, collectively used for predicting the charges of the defendant. To evaluate the CP-KG, we conducted experiments on two charge predic-tion datasets consisting of real legal cases. The experimental results demon-strate that the CP-KG achieves scores of 99.10% and 90.23% in the Macro-F1 respectively. Compared to the baseline methods, the CP-KG shows sig-nificant improvements with 25.79% and 13.82% respectively.
17:45	Yaoyao Yu and Yihui Qiu Enhancing Legal Judgment Prediction with Attentional Networks Utilizing Legal Event Types PRESENTER: Yaoyao Yu ABSTRACT. Legal Judgment Prediction (LJP) is a critical task that aims to predict charges, articles, and terms of penalties based on the fact descriptions provided in criminal cases. However, current LJP methods often fail to fully utilize the important aspect of legal event information, leading to suboptimal predictions. In order to address this issue, our proposed model introduces a legal event type attention mechanism, which effectively identifies key event information within the fact descriptions. By combining event-aware and event-free representations, our framework enables a comprehensive understanding of the fact descriptions, resulting in improved performance on LJP. Importantly, our approach outperforms state-of-the-art models, achieving an average improvement of 3.86% in the prediction of articles, 1.82% in the prediction of charges, and 5.24% in the prediction of terms of penalties.

Chairs:

Ning Jiang and Chenyou Fan

Location: Hefei Ting (合肥厅)

16:00	Tianrui Hou, Li Wu, Xinzhong Zhang, Xiaoying Wang and Jianqiang Huang STA-Net: Reconstruct Missing Temperature Data of Meteorological Stations Using a Spatiotemporal Attention Neural Network ABSTRACT. Reconstructing the missing meteorological site temperature data is of great significance for analyzing climate change and predicting related natural disasters, but is a trickily and urgently solved problem. In the past, various interpolation methods were used to solve this problem, but these methods basically ignored the temporal correlation of the site itself. Recently, the methods based on machine learning have been widely studied to solve this problem. However, these methods tend to handle the missing value situation of single site, neglecting spatial correlation between sites. Hence, we put forward a new spatiotemporal attention neural network (STA-Net) for reconstructing missing data in multiple meteorological sites. The STA-Net utilizes the currently state-of-the-art encoder-decoder deep learning architecture and is composed of two subnetworks which include local spatial attention mechanism (LSAM) and multidimensional temporal self-attention mechanism (MTSAM), respectively. Moreover, a multiple-meteorological-site data processing method is developed to generate matrix datasets containing spatiotemporal information so the STA-Net can be trained and tested. To evaluate the STANet, a large number of experiments on real Tibet and Qamdo datasets with the missing rates of 25%, 50% and 75%, respectively, are conducted, meanwhile compared with U-Net, PConvU-Net and BiLSTM. Experimental results have showed that our data processing method is effective and meantime and our STA-Net achieves greater reconstruction effect. In the case with the missing rate of 25% on Tibet test datasets and compared to the other three methods, the MAE declines by 60.21%, 36.42% and 12.70%; the RMSE declines by 56.28%, 32.03% and 14.17%; the R2 increases by 0.75%, 0.20% and 0.07%.
16:15	Zhaoyi Liu, Yuanbo Hou, Haoyu Tang, Álvaro López Chilet, Sam Michiels, Dick Botteldooren, Jon Ander Gómez Adrián and Danny Hughes CLF-AIAD: A Contrastive Learning Framework for Acoustic Industrial Anomaly Detection ABSTRACT. Acoustic Industrial Anomaly Detection (AIAD) has received a great deal of attention as a technique to discover faults or malicious activity, allowing for preventive measures to be more effectively targeted. The essence of AIAD is to learn the compact distribution of normal acoustic data and detect outliers as anomalies during testing. However, recent AIAD work does not capture the dependencies and dynamics of Acoustic Industrial Data (AID). To address this issue, we propose a novel Contrastive Learning Framework (CLF) for AIAD, known as CLF-AIAD. Our method introduces a multi-grained contrastive learning-based framework to extract robust normal AID representations. Specifically, we first employ a projection layer and a novel context-based contrast method to learn robust temporal vectors. Building upon this, we then introduce a sample-wise contrasting-based module to capture local invariant characteristics, improving the discriminative capabilities of the model. Finally, a transformation classifier is introduced to bolster the performance of the primary task under a self-supervised learning framework. Extensive experiments on two typical industrial datasets, MIMII and ToyADMOS, demonstrate that our proposed CLF-AIAD effectively detects various real-world defects and improves upon the state-of-the-art in unsupervised industrial anomaly detection.
16:30	Hubert Truchan, Evgenii Naumoff, Rezaul Abedin, Gregory Palmer and Zahra Ahmadi Multimodal Isotropic Neural Architecture with Patch Embedding ABSTRACT. Patch embedding has been a significant advancement in Transformer-based models, particularly the Vision Transformer (ViT), as it enables handling larger image sizes and mitigating the quadratic runtime of self-attention layers in Transformers. Moreover, it allows for capturing global dependencies and relationships between patches, enhancing effective image understanding and analysis. However, it is important to acknowledge that Convolutional Neural Networks (CNNs) continue to excel in scenarios with limited data availability. Furthermore, their efficiency in terms of memory usage and latency makes them particularly suitable for deployment on edge devices, underlining their practical significance. Expanding upon this, we propose Minape, a novel multimodal isotropic convolutional neural architecture that incorporates patch embedding. Minape extends the application of patch embedding to both time series and image data for classification purposes. By employing isotropic models, Minape addresses the challenges posed by varying data sizes and complexities of the data. It groups samples based on modality type, creating two-dimensional representations that undergo linear embedding before being processed by a scalable isotropic convolutional network architecture. The outputs from these pathways are merged to capitalize on the complementary information between modalities. A temporal classifier is then trained on these merged representations to distinguish between different classes. Experimental results demonstrate that Minape significantly outperforms existing approaches in terms of accuracy while requiring fewer than 1M parameters and occupying less than 12 MB in size. This performance was observed on multimodal benchmark datasets and the authors' newly collected multi-dimensional multimodal dataset, Mudestreda, obtained from real industrial processing devices\footnote{Link to code and dataset: \url{https://anonymous.4open.science/r/Minape-ED25}}.
16:45	Kefan Zhao and Nana Wang Generating Spatiotemporal Trajectories with GANs and Conditional GANs ABSTRACT. Modeling the movements of individual and populations, and generating synthetic spatiotemporal trajectory data play an important role in lots of (privacy-aware) analysis and applications, such as urban planning and route navigation. A key challenge in trajectory generation is to best capture the basic characteristics of the long sequences of location points. This is non-trivial considering the inherent se-quentiality and high-dimensionality of trajectory data. This paper presents TS-TrajGAN, a two-stage model to generate spatiotemporal trajectory data by com-bining a Generative Adversarial Network (GAN) and a conditional GAN. We train the GAN of stage I to simulate the distribution of the initial trajectory seg-ments such that the basic characteristics of the length-limited initial trajectory segments can be well depicted. In stage II, the conditional GAN is used to predict the next location point for the current generated trajectory and preserve the varia-bility in individuals’ mobility. In addition, a predictor network is added to the GAN of stage I for trajectory length prediction. Experiments on a real-world taxi dataset demonstrate that TS-TrajGAN is not only able to generate trajectories that have similar characteristics with the real ones, but also outperforms the state-of-the-art methods in terms of data utility. Our code is available at https://github.com/kfZhao726/TS-TrajGAN.
17:00	Chenyou Fan, Haiqi Jiang, Aimin Huang and Junjie Hu Trajectory Prediction with Contrastive Pre-training and Social Rank Fine-tuning ABSTRACT. This paper focuses on the accurate prediction of pedestrian trajectories in scenarios where individuals walk alone or in social groups, and sometimes alter their paths to avoid collisions. While previous work has improved backbone neural networks to model individual motion patterns, few studies have explicitly addressed the consistency of internal motion patterns or properness of external interactions. To address this, we propose a unified framework consisting of a Contrastive History-Prediction (CHIP) module and a Differentiable Social Interaction Ranking (DSIR) module. The CHIP module utilizes unsupervised contrastive loss to optimize predicted motion patterns consistent with observations, while the supervised DSIR module ensures predicted interactions are compatible with realistic positions. Our analysis and numerical studies demonstrate the effectiveness of our approach, which achieves a 5-10% improvement in positional accuracy and a 3-7% boost in interactive properness. We provide comprehensive visualizations of anticipated trajectories with temporal interactive scores across various scenarios.
17:15	Cheng Lin, Ning Jiang, Jialiang Tang, Xinlei Huang and Wenqing Wu Dynamic Knowledge Distillation for Reduced Easy Examples ABSTRACT. Knowledge distillation is usually performed by promoting a small model (student) to mimic the knowledge of a large model (teacher). The current knowledge distillation methods mainly focus on the extraction and transformation of knowledge while ignoring the importance of examples in the dataset and assigning equal weight to each example. Therefore, in this paper, we propose Dynamic Knowledge Distillation (Dy-KD). To alleviate this problem, Dy-KD incorporates a curriculum strategy to selectively discard easy examples during knowledge distillation. Specifically, we estimate the difficulty level of examples by the predictions from the superior teacher network and divide examples in a dataset into easy examples and hard examples. Subsequently, these examples are given various weights to adjust their contributions to the knowledge transfer. We validate our Dy-KD on CIFAR-100 and Tiny-ImageNet; the experimental results show that: (1) Use the curriculum strategy to discard easy examples to prevent the model's fitting ability from being consumed by fitting easy examples. (2) Giving hard and easy examples varied weight so that the model emphasizes learning hard examples, which can boost students' performance. At the same time, our method is easy to build on the existing distillation method.
17:30	Junzhuo Liu, Yuanyuan Ren, Weiwei Li, Yuchen Zheng and Chenyang Wang Improving Out-of-Distribution Detection with Margin-Based Prototype Learning ABSTRACT. Deep Neural Networks often make overconfident predictions when encountering out-of-distribution (OOD) data. Previous prototype-based methods significantly improved OOD detection performance by optimizing the representation space. However, practical scenarios present a challenge where OOD samples near class boundaries may overlap with in-distribution samples in the feature space, resulting in misclassification, and few methods have considered the challenge. In this work, we propose a margin-based method that introduces a margin into the common instance-prototype contrastive loss. The margin leads to broader decision boundaries, resulting in better distinguishability of OOD samples. In addition, we leverage learnable prototypes and explicitly maximize prototype dispersion to obtain an improved representation space. We validate the proposed method on several common benchmarks with different scoring functions and architectures. Experiments results show that the proposed method achieves state-of-the-art performance.
17:45	Yi Li, Qingmeng Zhu, Hao He, Ziyin Gu and Changwen Zheng MOC: Multi-modal Sentiment Analysis via Optimal Transport and Contrastive Interactions ABSTRACT. Multi-modal sentiment analysis (MSA) aims to utilize information from various modalities to improve the classification of emotions. Most existing studies employ attention mechanisms for modality fusion, overlooking the heterogeneity of different modalities. To address this issue, we propose an approach that leverages optimal transport for modality alignment and fusion, specifically focusing on distributional alignment. However, solely relying on the optimal transport module may result in a deficiency of intra-modal and inter-sample interactions. To tackle this deficiency, we introduce a double-modal contrastive learning module. Specifically, we propose a model MOC (Multi-modal sentiment analysis via Optimal transport and Contrastive interactions), which integrates optimal transport and contrastive learning. Through empirical comparisons on three established multi-modal sentiment analysis datasets, we demonstrate that our approach achieves state-of-the-art performance. Additionally, we conduct extended ablation studies to validate the effectiveness of each proposed module.

16:00-18:00 Session ThuC4: Optimization

Chairs:

Shuyuan You and Ye Liu

16:00-18:00 Session ThuD4: Information Security

16:00	Yuanyuan Yue, Qingshan Liu and Ziming Zhang A Distributed Projection-based Algorithm with Local Estimators for Optimal Formation of Multi-robot System ABSTRACT. In general, the optimal formation problem can be modeled as a standard constrained optimization problem according to the shape theory. By adding local supplementary estimators, it can be further modeled as a distributed constrained optimization problem. Then a distributed projection-based algorithm is designed for solving this problem. The aim of the algorithm is to drive a group of robots to move to the desired geometric pattern by minimizing the total travel distance of robots from the initial positions. It is worth noticing that, as long as the graph of the communication network among the robots is undirected and connected, the global convergence of the algorithm can be guaranteed. Moreover, all of the robots finally form an ideal formation in the limited space. Finally, simulation results are provided to verify the effectiveness of the proposed distributed algorithm.
16:15	Tiantian Lv, Lu Wu, Zhigang Zhao, Chunxiao Wang and Chuantao Li A Memory Optimization Method for Distributed Training ABSTRACT. In recent years, with the continuous development of artificial intelligence technology, the complexity of deep learning algorithms and the scale of model training is also increasing. A series of efficient pipelined parallel training methods emerged to improve the training speed and accuracy. Distributed training becomes an effective way to train large-scale models. To solve this problem, we propose an efficient pipeline-parallel training optimization method. Our approach processes small batches of data in parallel through multiple compute nodes in a pipelined manner. We propose a prefix sum partition algorithm to realize a balanced partition and save the memory of computing resources. At the same time, we also design a clock optimization strategy to limit the number of weight version generations to ensure the model's accuracy. Compared with the current famous pipeline parallel frameworks, our method can achieve about 2 times training acceleration, save about 30\% of memory consumption, and improve the model accuracy by about 10\% compared with PipeDream.
16:30	Mingfei Li, Shikui Tu and Lei Xu Generalizing Graph Network Models for the Traveling Salesman Problem with Lin-Kernighan-Helsgaun Heuristics ABSTRACT. Existing graph convolutional network (GCN) models for the traveling salesman problem (TSP) cannot generalize well to TSP instances with larger number of cities than training samples, and the NP-Hard nature of the TSP renders it impractical to use large-scale instances for training. This paper proposes a novel approach that generalizes well a pre-trained GCN model for a fixed small TSP size to large scale instances with the help of Lin-Kernighan-Helsgaun (LKH) heuristics. This is realized by first devising a Sierpinski partition scheme to partition a large TSP into sub-problems that can be efficiently solved by the pre-trained GCN, and then developing an attention-based merging mechanism to integrate the sub-solutions as a whole solution to the original TSP instance. Specifically, we train a GCN model by supervised learning to produce edge prediction heat maps of small-scale TSP instances, then apply it to the sub-problems of a large TSP instance generated by partition strategies. Controlled by an attention mechanism, all the heat maps of the sub-problems are merged into a complete one to construct the edge candidate set for LKH. Experiments show that this new approach significantly enhances the generalization ability of the pre-trained GCN model without using labeled large-scale TSP instances in the training process and also outperforms LKH in the same time limit.
16:45	Ye Liu, Yan Pan and Jian Yin Deep Hashing for Multi-label Image Retrieval with Similarity Matrix Optimization of Hash Centers and Anchor Constraint of Center Pairs ABSTRACT. Deep hashing can improve computational efficiency and save storage space, which is the most significant part of image retrieval task and has received extensive research attention. Existing deep hashing frameworks mainly fall into two categories: single-stage and two-stage. For multi-label image retrieval, most single-stage and two-stage deep hashing methods usually consider two images to be similar if one pair of the corresponding category labels is the same, and do not make full use of the multi-label information. Meanwhile, some novel two-stage deep hashing methods proposed in recent years construct hash centers firstly and then train through deep neural networks. For multi-label processing, these two-stage methods usually converts multi-label into single-label objective, which also leads to insufficient use of label information. In this paper, a novel multi-label deep hashing method is proposed by constructing the similarity matrix and designing the optimization algorithm to construct the hash centers, and the proposed method constructs the training loss function through the multi-label hash centers constraint and anchor constraint of center pairs. Experiments on several multi-label image benchmark datasets show that the proposed method can achieve the state-of-the-art results.
17:00	Shuyuan You, Zhiqiang Zhuang, Haiying Wu, Kewen Wang and Zhe Wang Improving Deep Learning Powered Auction Design ABSTRACT. Designing incentive-compatible and revenue-maximizing auctions is pivotal in mechanism design. Often referred to as optimal auction design, the area has seen little theoretical breakthrough since Myerson's 1981 seminal work. Not to mention general combinatorial auctions, we don't even know the optimal auction for selling as few as two distinct items to more than one bidder. In recent years, the stagnation of theoretical progress has promoted many in using deep learning models to find near-optimal auction mechanisms. In this paper, we provide two general methods to improve such deep learning models. Firstly, we propose a new data sampling method that achieves better coverage and utilisation of the possible data. Secondly, we propose a more fine-grained neural network architecture. Unlike existing models which output a single payment percentage for each bidder, the refined network outputs a separate payment percentage for each item. Such an item-wise approach captures the interaction among bidders at a granular level beyond previous models. We conducted comprehensive and in-depth experiments to test our methods and observed improvement in all tested models over their original design. Noticeably, we achieved state-of-the-art performance by applying our methods to an existing model.
17:15	Xitie Zhang, Suping Wu, Zhixiang Yuan, Xinyu Li, Kehua Ma, Leyang Yang and Zhiyuan Zhou A Bi-Directional Optimization Network for De-Obscured 3D High-Fidelity Surface Reconstruction ABSTRACT. 3D detailed face reconstruction based on monocular images aims to reconstruct a 3D face from a single image with rich face detail. The existing methods have achieved significant results, but still suffer from inaccurate face geometry reconstruction and artifacts caused by mistaking hair for wrinkle information. To address these problems, we propose a bi-directional optimization network for de-obscured 3D high-fidelity surface reconstruction. Specifically, our network is divided into two stages: face geometry fitting and face detail optimization. In the first stage, we design a global and local bi-directional optimized feature extraction network that uses both local and global information to jointly constrain the face geometry and ultimately achieve an accurate 3D face geometry reconstruction. In the second stage, we decouple the hair and the face using a segmentation network and use the distribution of depth values in the facial region as a prior for the hair part, after which the FPU-net detail extraction network we designed is able to reconstruct finer 3D face details while removing the hair occlusion problem. With only a small number of training samples, extensive experimental results on multiple evaluation datasets show that our method achieves competitive performance and significant improvements over state-of-the-art methods.
17:30	Abir Barbara, Younès Bennani and Joseph Karkazan On the use of persistent homology to control the generalization capacity of a neural network ABSTRACT. Analyzing the generalization capacity of neural networks (NN) is a crucial task to ensure that the model has been learned and can perform well on unseen data, rather than being limited to the learning data. However, the ordinary approach of evaluating the performance of NN on multiple testing datasets can be both costly and time-consuming, as it requires obtaining, pre-processing, and labeling new testing datasets. The key problem is to find the right capacity for the number of training observations. It is therefore necessary to adjust the learning system's capacity both to the task and to the information provided by the data, to obtain the best generalization. The work presented in this paper is set in this context and applies techniques from Algebraic Topology, and relevance measures to study the behaviour of the NN during learning. We define NN on a topological space as a functional topology graph. A set of topological summaries, are then calculated to estimate the generalization gap. This estimation is carried out in parallel with an assessment of the relevance of NN units, including a progressive pruning of the network units. During this pruning, the generalization gap estimation enables us to detect overfitting for the NN and then to determine when to perform early-stopping and identify the architecture offering the best generalization. Our approach provides a more comprehensive understanding of NN generalization capacity and can be used to investigate the extensibility and interpretability of a NN.
17:45	Xiwen Luo, Qiang Fu, Sheng Qin and Kaiyang Wang Encrypted-SNN: A Privacy-Preserving Method for converting Artificial Neural Networks to Spiking Neural Networks ABSTRACT. The conversion from Artificial Neural Networks (ANNs) to Spiking Neural Networks (SNNs) poses a significant challenge. Privacy preservation in the conversion process is crucial to protect data-sensitive information. This work proposes a novel Encrypted-SNN to address the privacy problems in ANN to SNN conversion (ANN-SNN). By adding noise to gradients in both ANN and SNN, privacy protection can be enhanced without affecting network performance. The proposed method uses popular datasets including CIFAR10, MNIST, and Fashion MNIST for testing, with accuracies of 88.1%, 99.3%, and 93.0% respectively. The impact of three different privacy budgets (ϵ=0.5, 1.0, and 1.6) on the accuracy is discussed. Experimental results show that the privacy and performance trade-off of the proposed Encrypted-SNN is effectively improved, which has practical significance in protecting data privacy and can enhance the security and privacy of spiking neural networks.

Chairs:

Renchao Qin and Moe Hdaib

16:00-18:00 Session ThuE4: Deep Learning 2

16:00	Hui Zeng, Biwei Chen, Rongsong Yang, Chenggang Li and Anjie Peng Towards undetectable adversarial examples: a steganographic perspective ABSTRACT. Over the past decade, adversarial examples demonstrate an enhancing ability to fool neural networks. However, most adversarial examples can be easily detected, especially under statistical analysis. Ensuring undetectability is crucial for the success of adversarial examples in practice. In this paper, we borrow the idea of the embedding suitability map from steganography and employ it to modulate the adversarial perturbation. In this way, the adversarial perturbations are concentrated in the hard-to-detect areas and are attenuated in predictable regions. Extensive experiments show that the proposed scheme is compatible with various existing attacks and can significantly boost the undetectability of adversarial examples against both human inspection and statistical analysis of the same attack ability. Code is available at github.com/zengh5/Undetectable-attack.
16:15	Sheng Ran, Baolin Zheng and Mingwei Sun SDBC: A Novel and Effective Self-Distillation Backdoor Cleansing Approach ABSTRACT. Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, which only need to poison a small portion of samples to control the behavior of the target model. Moreover, the escalating stealth and power of backdoor attacks present not only significant challenges to backdoor defenses but also enormous potential threats to the widespread adoption of DNNs. In this paper, we propose a novel backdoor defense framework, called Self-Distillation Backdoor Cleansing (SDBC), to remove backdoor triggers from the attacked model. For the practical scenario where only a very small portion of clean data is available, SDBC first introduces self-distillation to clean the backdoor in DNNs. Extensive experiments demonstrate that SDBC can effectively remove backdoor triggers under 6 state-of-the-art backdoor attacks using less than 5% or even less than 1% clean training data without compromising accuracy. Experimental results show that the proposed SDBC outperforms existing state-of-the-art (SOTA) methods, reducing the average ASR from 95.36% to 5.75% and increasing the average ACC by 1.92%.
16:30	Longsheng Ye, Kai Gao, Shuo Huang, Hao Huang and Ronghua Du A reinforcement learning-based controller designed for Intersection signal suffering from Information Attack ABSTRACT. With the rapid development of smart technology and wireless communication technology, Intelligent Transportation System (ITS) is considered as an effective way to solve the traffic congestion problem. ITS is able to collect real-time road vehicle information through sensors such as networked vehicles (CV) and cameras, and through real-time interaction of information, signals can more intelligently implement adaptive signal adjustment, which can effectively reduce vehicle delays and traffic congestion. However, this connectivity also poses new challenges in terms of being affected by malicious attacks that affect traffic safty and efficiency. Reinforcement learning is considered as the future trend of control algorithms for intelligent transportation systems. In this paper, we design reinforcement learning intelligent control algorithms to control the intersection signal imposed by malicious attacks. The results show that the reinforcement learning-based signal control model can reduce vehicle delay and queue length by 22% and 23% relative to timing control. Meanwhile, the intensity learning is a model-free control method, which makes it impossible for attackers to target flaws in specific control logic and evaluate the impact of information attacks more effectively. Designing a coordinated state tampering attack between different lanes, the results show that the impact is greatest when the attacked states are in the same phase.
16:45	Moe Hdaib, Sutharshan Rajasegarar and Lei Pan Quantum Autoencoder Frameworks for Network Anomaly Detection ABSTRACT. Detecting anomalous activities in network traffic is important for the timely identification of emerging cyber attacks. Accurate analysis of the emerging patterns in the network traffic is critical to identify suspicious behaviors. In this paper, novel quantum deep autoencoder based anomaly detection frameworks are proposed for accurately detecting the security attacks that emerge in the network. In particular, we propose three frameworks, one by constructing several reconstruction error thresholds-based methods; second, a union of a quantum autoencoder and a one-class support vector machine-based method; and third a union of a quantum autoencoder and quantum random forest-based method. Using a publicly available benchmark dataset, the quantum frameworks’ effectiveness in accurately detecting the attacks are evaluated. Our empirical evaluations demonstrate the improvements in accuracy and F1-score for the three frameworks.
17:00	Shufan Yang, Qianmu Li, Zhichao Lian, Pengchuan Wang and Jun Hou MIC: An Effective Defense Against Word-level Textual Backdoor Attacks ABSTRACT. Backdoor attacks, which manipulate model output, have garnered significant attention from researchers. However, some existing word-level backdoor attack methods in NLP models are difficult to defend effectively due to their concealment and diversity. These covert attacks use two words that appear similar to the naked eye but will be mapped to different word vectors by the NLP model as a way of bypassing existing defenses. To address this issue, we propose incorporating triple metric learning into the standard training phase of NLP models to defend against existing word-level backdoor attacks. Specifically, metric learning is used to minimize the distance between vectors of similar words while maximizing the distance between them and vectors of other words. Additionally, given that metric learning may reduce a model's sensitivity to semantic changes caused by subtle perturbations, we added contrastive learning after the model's standard training. Experimental results demonstrate that our method performs well against the two most stealthy existing word-level backdoor attacks.
17:15	Zhanhong Yin, Renchao Qin, Chengzhuo Ye, Fei He and Lan Zhang Botnet Detection Method based on NSA and DRN ABSTRACT. Botnets are one of the most serious cybersecurity threats facing organizations today. Although the analysis and detection of botnets have achieved a lot of research results, it still has problems such as strong concealment and difficult identification. Therefore, we propose a botnet detection method based on NSA and DRN. This method uses our improved NSA to expand the preprocessed and dimensionally reduced malicious traffic data with fewer samples, and then extracts useful features of network traffic from two dimensions through SENet-based DRN combined with BiGRU. Experimental results based on the CICIDS-2017 and UNSW-NB15 datasets show that our proposed method has a high accuracy for botnet detection and improves the detection accuracy of rare malicious traffic. 99.98% and 99.94%. In addition, we further demonstrate the good generalization ability and robustness of our method in botnet detection through an ablation study.
17:30	Tengxiao Yang, Song Lian, Chenyu Hu, Shanqing Guo and Qiong Jia Multi-granularity Deep Vulnerability Detection using Graph Neural Networks ABSTRACT. The significance of vulnerability detection has grown increasingly crucial due to the escalating cybersecurity threats. Investigating automated vulnerability detection techniques to avoid high false positives and high false negatives is an important issue in the current software security field. In recent years, there has been a substantial focus on deep learning-based vulnerability detectors, which have achieved remarkable success. To fill the gap of multi-granularity program representation, we propose MulGraVD, a deep learning-based vulnerability detector at the function level. MulGraVD captures the continuity and structure of the programming language by considering information at word, statement, basic block, and function granularity respectively. To overcome the constraint posed by hyperparameter layers in the information aggregation process of graph neural networks, MulGraVD serially passes information from coarse to fine granularity, which facilitates the mining of vulnerability patterns. Our experimental evaluation on FFMPeg+Qemu and ReVeal datasets shows that MulGraVD significantly outperforms existing state-of-the-art methods in terms of precision, recall, and F1 score, with an average improvement of 11.62% in precision, 27.69% in recall, and 19.71% in F1 score.
17:45	Di Xiao, Jinkun Li and Min Li Privacy-Preserving Federated Compressed Learning Against Data Reconstruction Attacks Based on Secure Data ABSTRACT. Federated learning is a new distributed learning framework with data privacy preserving in which multiple users collaboratively train models without sharing data. However, recent studies highlight potential privacy leakage through shared gradient information. Several defense strategies, including gradient information encryption and perturbation, have been suggested. But these strategies either involve high complexity or are susceptible to attacks. To counter these challenges, we propose to train on secure compressive measurements by compressed learning, thereby achieving local data privacy protection with minimal performance degradation. A feasible method to boost performance is the joint optimization of the sensing matrix and the inference network during the training phase, but this may suffer from data reconstruction attacks again. Thus, we further incorporate a traditional lightweight encryption scheme to protect data privacy. Experiments conducted on MNIST and FMNIST datasets substantiate that our schemes achieve a satisfactory balance between privacy protection and model performance.

Chairs:

Yufei Chen and Yuqing Sun

Location: Fuzhou Ting (福州厅)

16:00	Chao Song, Haidong Li, Jie Wang, Zhaoyi Jiang, Bailin Yang, Shujie Chen and Dong Zheng Transformer-based Video Deinterlacing Method ABSTRACT. Deinterlacing is a classical issue in video processing, aimed at generating progressive video from interlaced content. There are precious videos that are difficult to reshoot and still contain interlaced content. Previous methods have primarily focused on simple interlaced mechanisms and have struggled to handle the complex artifacts present in real-world early videos. Therefore, we propose a Transformer-based method for deinterlacing, which consists of a Feature Extractor, a De-Transformer, and a Residual DenseNet module. By incorporating self-attention in Transformer, our proposed method is able to better utilize the inter-frame movement correlation. Additionally, we combine a properly designed loss function and residual blocks to train an end-to-end deinterlacing model. Extensive experimental results on various video sequences demonstrate that our proposed method outperforms state-of-the-art methods in different tasks by up to 1.41~2.64dB. Furthermore, we also discuss several related issues, such as the rationality of the network structure. The code for our proposed method is available at https://github.com/Anonymous2022-cv/DeT.git.
16:15	Sana Jabbar and Murtaza Taj Stereoential Net: Deep Network for Learning Building Height Using Stereo Imagery ABSTRACT. Height estimation plays a crucial role in the planning and assessment of urban development, enabling effective decision-making and evaluation of urban built areas. Accurate estimation of building heights from remote sensing optical imagery poses significant challenges in preserving both the overall structure of complex scenes and the intricate elevation details of the buildings. This paper proposes a novel end-to- end deep learning-based network (Stereoential Net) comprising a multi-scale differential shortcut connection module (MSDSCM) at the decoding end and a modified stereo U-Net (mSUNet). The proposed Stereoential network performs a multi-scale differential decoding features fusion to preserve fine details for improved height estimation using stereo optical imagery. Unlike existing methods, our approach does not use any multi-spectral satellite imagery, instead, it only employs freely available optical imagery, yet it achieves superior performance. We evaluate our proposed network on two benchmark datasets, the IEEE Data Fusion Contest 2018 (DFC2018) dataset and the 42-cities dataset. The 42-cities dataset is comprised of 42 different densely populated cities of China having diverse sets of buildings with varying shapes and sizes. The quantitative and qualitative results reveal that our proposed network outperforms the SOTA algorithms for DFC2018. Our method reduces the root-mean-square error (RMSE) by 0.31 meters as compared to state-of-the-art multi-spectral approaches on the 42-cities dataset. The code will be made publically available via the Github repository.
16:30	Xingran Guo, Haizheng Yu and Xueying Liao WCA-VFnet:a dedicated complex forest smoke fire detector ABSTRACT. Forest fires pose a significant threat to ecosystems, causing extensive damage. While state-of-the-art detection algorithms like YoloX, Deformable DETR, and VarifocalNet have demonstrated remarkable performance in the field of object detection, their effectiveness in detecting forest smoke fires, especially in complex scenarios with small smoke and flame targets, remains limited. To address this issue, we propose WCA-VFnet, an innovative approach that incorporates the Weld C-A component—a method featuring shared convolution and fusion attention. Furthermore, we have curated a distinctive dataset called T-SMOKE, specifically tailored for detecting small-scale, low-resolution forest smoke fires. Our experimental results show that WCA-VFnet achieves a significant improvement of approximately 35% in average precision (AP) for detecting small flame targets compared to Deformable DETR.
16:45	Xiaodi Wu, Zhiqiang Zhang, Wenxin Yu, Shiyu Chen, Yufei Gao, Peng Chen and Jun Gong Depth Normalized Stable View Synthesis ABSTRACT. Novel view synthesis (NVS) aims to synthesize photo-realistic images depicting a scene by utilizing existing source images. The synthesized images are supposed to be as close as possible to the scene content. We present Deep Normalized Stable View Synthesis (DNSVS), an NVS method for large-scale scenes based on the pipeline of Stable View Synthesis (SVS). SVS combines neural networks with the 3D scene representation obtained from structure-from-motion and multi-view stereo, where the view rays corresponding to each surface point of the scene representation and the source view feature vector together yield a value of each pixel in the target view. However, it weakens geometric information in the refinement stage, resulting in blur and artifacts in novel views. To address this, we propose DNSVS that leverages the depth map to enhance the rendering process via a normalization approach. The proposed method is evaluated on the Tanks and Temples dataset, as well as the FVS dataset. The average Learned Perceptual Image Patch Similarity (LPIPS) of our results is better than state-of-the-art NVS methods by 0.12%, indicating the superiority of our method.
17:00	Lingfan Yuan, Wenxin Yu, Lu Che, Zhiqiang Zhang, Shiyu Chen, Lu Liu and Peng Chen Image Inpaiting with Semantic U-Transformer ABSTRACT. With the driving force of powerful convolutional neural networks, image inpainting has made tremendous progress. Recently, transformer has demonstrated its effectiveness in various vision tasks, mainly due to its capacity to model long-term relationships. However, when it comes to image inpainting tasks, the transformer tends to fall short in terms of modeling local information, and interference from damaged regions can pose challenges. To tackle these issues, we introduce a novel Semantic U-shaped Transformer (SUT) in this work. The SUT is designed with spectral transformer blocks in its shallow layers, effectively capturing local information. Conversely, deeper layers utilize BRA transformer blocks to model global information. A key feature of the SUT is its attention mechanism, which employs bi-level routing attention. This approach significantly reduces the interference of damaged regions on overall information, making the SUT more suitable for image inpainting tasks. Experiments on several datasets indicate that the performance of the proposed method outperforms the current state-of-the-art (SOTA) inpainting approaches. In general, the PSNR of our method is on average 0.93 dB higher than SOTA, and the SSIM is higher by 0.026.
17:15	Lei Mao, Jianxia Chen, Shi Dong, Liang Xiao, Haoying Si, Xinyun Wu and Shu Li A Novel Interaction Convolutional Network Based on Dependency Trees for Aspect-level Sentiment Analysis ABSTRACT. Aspect-based sentiment analysis aims to identity the sentiment polarity of a given aspect-based word in a sentence. Due to the complexity of sentences in the texts, the models based on the graph neural network still have issues in the accurately capturing the relationship between aspect words and view-point words in sentences, failing to improve the accuracy of classification. To solve this problem, the paper proposes a novel Aspect-level Sentiment Analysis model based on Interactive convolutional network with the depend-ency trees, named ASAI-DT in short. In particular, the ASAI-DT model first extracts the aspect words representation from the sentence representation trained by the Bi-GRU model. Meanwhile, the self-attention score of both the sentence and aspect representation are calculated separately by the self-attention mechanism, in order to reduce the attention to the irrelevant in-formation. Afterward, the proposed model constructs the sub-tree of the de-pendency trees for the word, while the attention weight scores of the aspect representations will be integrated into the sub-tree. Therefore, the acquired comprehensive information about aspect words is processed by the graph convolutional network to maximize the retention of valid information and minimize the interference of noise. Finally, the effective information can be preserved more completely in the integrated information through the inter-active network. Through a large number of experiments on various data sets, the proposed ASAI-DT model shows both the effectiveness and the accuracy of aspect sentiment analysis, which outperforms many aspect -based senti-ment analysis models.
17:30	Wen Li, Yi Xie and Yuqing Sun Differentiable Topics Guided New Paper Recommendation ABSTRACT. There are a large number of scientific papers published each year. Since the progresses on scientific theories and technologies are quite different, it is challenging to recommend valuable new papers to the interested researchers. Papers usually have multiple levels of contributions, and accordingly, users also have fine-grained retrieval requirements. Moreover, the propagation of academic knowledge is asymmetrical. In this paper, we investigate the new paper recommendation task from the point of involved topics and use the concept of subspace to distinguish different levels of innovations or academic contributions of papers. We adopt the neural topic model to model the papers by the topic distribution over different subspaces. The academic influences between papers are modeled as the topic propagation, which are learned by the asymmetric convolution on the citation network, reflecting the asymmetry of academic knowledge propagation. The experimental results on real datasets show that our model is better than the baselines on new paper recommendation. Specially, the introduced subspace embeddings of paper are differentiable over topics that can help find the paper innovations. Besides, we conducted experiments from multiple aspects to verify the validity of our model.
17:45	Lulu Liu, Ziqi Xie, Yufei Chen and Qiujun Deng Co-GAN:A Text-to-Image Synthesis Model with Local and Integral Features ABSTRACT. Text-to-Image synthesis is a promising technology that generates realistic images from textual descriptions by deep learning model. However, the state-of-the-art text-to-image synthesis models often struggle to balance the overall integrity and local diversity of objects with rich details, leading to unsatisfactory generation results of some domainspecific images, such as industrial applications. To address this issue, we propose Co-GAN, a text-to-image synthesis model that introduces two modules to enhance local diversity and maintain overall structural integrity respectively. Local Feature Enhancement (LFE) module improves the local diversity of generated images, while Integral Structural Maintenance (ISM) module ensures that the integral information is preserved. Furthermore, a cascaded central loss is proposed to address the instability during the generative training. To tackle the problem of incomplete image types in existing datasets, we create a new text-to-image synthesis dataset containing seven types of industrial components, and test the effects of various existing methods based on the dataset. The results of comparative and ablation experiments show that, compared with other current methods, the images generated by Co-GAN contain more details and maintain the integrity.

16:00-18:00 Session ThuF4: Machine Learning 2

Chairs:

Kosuke Yoshimura and Baoliang Lu

16:00-18:00 Session ThuG4: Image Processing & Computer Vision 5

16:00	Danke Wu, Zhenhua Tan and Taotao Jiang User stance aware network for rumor detection using semantic relation inference and temporal graph convolution ABSTRACT. The massive propagation of rumor has impaired the credibility of online social networks while effective rumor detection remains a difficulty. Recent studies leverage stance inference to explore the semantic evidence in comments to improve detection performance. However, existing models only consider stance-relevant semantic features and ignore stance distribution and evolution, thus leaving room for improvement. Moreover, we argue that stance inference without considering the context in threads may lead to incorrect semantic features being accumulated and carried through to rumor detection. In this paper, we propose a user stance aware attention network (USAT), which learns the temporal features in semantic content, individual stance and collective stance for rumor detection. Specifically, a high-order graph convolutional operator is designed to aggregate the preceding posts of each post, ensuring a complete semantic context for stance inference. Two temporal graph convolutional networks work in parallel to model the evolution of stance distribution and semantic content respectively and share stance-based attention for de-nosing content aggregation. Extensive experiments demonstrate that our model outperforms the state-of-the-art baselines. Our model will be available on Github upon acceptance.
16:15	Kosuke Yoshimura and Hisashi Kashima Label Selection Approach to Learning from Crowds ABSTRACT. Supervised learning, especially supervised deep learning, requires large amounts of labeled data. One approach to collect large amounts of labeled data is by using a crowdsourcing platform where numerous workers perform the annotation tasks. However, the annotation results often contain label noise, as the annotation skills vary depending on the crowd workers and their ability to complete the task correctly. Learning from Crowds is a framework which directly trains the models using noisy labeled data from crowd workers. In this study, we propose a novel Learning from Crowds model, inspired by SelectiveNet proposed for the selective prediction problem. The proposed method called Label Selection Layer trains a prediction model by automatically determining whether to use a worker’s label for training using a selector network. A major advantage of the proposed method is that it can be applied to almost all variants of supervised learning problems by simply adding a selector network and changing the objective function for existing models, without explicitly assuming a model of the noise in crowd annotations. The experimental results show that the performance of the proposed method is almost equivalent to or better than the Crowd Layer, which is one of the state-of-the-art methods for Deep Learning from Crowds, except for the regression problem case.
16:30	Bathini Sai Akash, Lov Kumar, Vikram Singh, Anoop Kumar Patel and Aneesh Krishna Empirical Analysis of Multi-label Classification on GitterCom using BERT ABSTRACT. To maintain development consciousness, simplify project coordination, and prevent misinterpretation, communication is essential for software development teams. Instant private messaging, group chats, and sharing code are just a few of the capabilities that chat rooms provide to assist and meet the communication demands of software-development teams. All of this is capacitated to happen in real-time. Consequently, chat rooms have gained popularity among developers. Gitter is one of these platforms that has gained popularity, and the conversations it contains may be a treasure trove of data for academics researching open-source software systems. This research made use of the GitterCom dataset, The largest collection of Gitter developer messages that have been carefully labeled and curated and perform multi-label classification for the Purpose Category in the dataset. Extensive empirical analysis is performed on 6 feature selection techniques, 14 machine learning classifiers, and BERT transformer layer architecture with layer-by-layer comparison. Consequently, we achieve proficient results through our research pipeline involving Extra Trees Classifier and Random Forest classifiers with AUC(OvR) median performance of 0.94 and 0.92 respectively. Furthermore, The research proposed research pipeline could be utilized for generic multi-label text classification on software developer forum text data.
16:45	Lulu Cao, Zimo Zheng, Chenwen Ding, Jinkai Cai and Min Jiang Genetic Programming Symbolic Regression with Simplification-Pruning Operator for Solving Differential Equations ABSTRACT. Differential equations (DEs) are important mathematical models for describing natural phenomena and engineering problems. Finding analytical solutions for DEs has theoretical and practical benefits. However, traditional methods for finding analytical solutions only work for some special forms of DEs, such as separable variables or transformable to ordinary differential equations. For general nonlinear DEs, analytical solutions are often hard to obtain. The current popular method based on neural networks requires a lot of data to train the network and only gives approximate solutions with errors and instability. It is also a black-box model that is not interpretable. To obtain analytical solutions for DEs, this paper proposes a symbolic regression algorithm based on genetic programming with the simplification-pruning operator (SP-GPSR). This method introduces a new operator that can simplify the individual expressions in the population and randomly remove some structures in the formulas. Moreover, this method uses multiple fitness functions that consider the accuracy of the analytic solution satisfying the sampled data and the differential equations. In addition, this algorithm also uses a hybrid optimization technique to improve search efficiency and convergence speed. This paper conducts experiments on two typical classes of DEs. The results show that the proposed method can effectively find analytical solutions for DEs with high accuracy and simplicity.
17:00	Ayushi Kohli and V. Susheela Devi Explainable Offensive Language Classifier ABSTRACT. Offensive content in social media has become a serious issue, due to which its automatic detection is a crucial task. Deep learning approaches for Natural Language Processing (or NLP) have proven to be on or even above human-level accuracy for offensive language detection tasks. Due to this, the deployment of deep learning models for these tasks is justified. However, there is one key aspect that these models lack, which is explainability, in contrast to humans. In this paper, we provide an explainable model for offensive language detection in the case of multi-task learning. Our model achieved an F1 score of 0.78 on the OLID dataset and 0.85 on the SOLID dataset. We also provide a detailed analysis of the model interpretability.
17:15	Xuan Qi, Yi Wei, Xue Mei, Ryad Chellali and Shipin Yang Comparative Analysis of the Linear Regions in ReLU and LeakyReLU Networks ABSTRACT. Networks with piecewise linear activation functions partition the input space into numerous linear regions. As such, the number of linear regions can serve as a metric to quantify the expressive capacity of networks employing ReLU (Rectified Linear Unit) and LeakyReLU activations. One notable drawback of the ReLU network lies in the potential occurrence of the "dying ReLU" issue during training, whereby the output and gradient remain zero when the input to a ReLU layer is negative. This results in ineffective weight updates and renders the affected neurons unresponsive, consequently impeding their contribution to network training. In this study, we perform a statistical analysis on the actual number of linear regions expressed by ReLU and LeakyReLU networks, providing an intuitive explanation for the "dying ReLU" problem. Our findings indicate that, under consistent input distributions and network parameters, LeakyReLU networks generally exhibit stronger expressive capacity in terms of linear regions compared to ReLU networks. We hope that our research can provide inspiration for the design of activation functions and contribute to the exploration and analysis of the behaviors exhibited by piecewise linear activation functions in networks.
17:30	Zhong-Wei Jin, Jia-Wen Liu, Bao-Liang Lu and Wei-Long Zheng DAformer: Transformer with Domain Adversarial Adaptation for EEG-based Emotion Recognition with Live-Oil Paintings ABSTRACT. The emergence of domain adaptation has brought remarkable advancement to EEG-based emotion recognition by reducing subject variability thus increases the accuracy of cross-subject tasks. A wide variety of materials have been employed to elicit emotions in experiments, however, artistic works which aim to evoke emotional resonance of observers are relatively less frequently utilized. Previous research has shown promising results in EEG-based emotion recognition on static oil paintings. As video clips are widely recognized as the most commonly used and effective stimuli, we adopted animated live oil paintings, a novel emotional stimuli in live form which are essentially a type of video clip while possessing fewer potential influencing factors for EEG signals compared to traditional video clips, such as abrupt switches on background sound, contrast and color tones. Moreover, previous studies on static oil paintings focused primarily on the subject-dependent task, and further research involving cross-subject analysis remains to be investigated. In this paper, we proposed a novel DAformer model which combines the advantages of Transformer and adversarial learning. In order to enhance the evocative performance of oil paintings, we introduced an innovative emotional stimuli by transforming static oil paintings into animated live forms. We developed a new emotion dataset SEED-LOP (SJTU EEG Emotion Dataset-Live Oil Painting) and constructed DAformer to verify the effectiveness of SEED-LOP. The results demonstrated higher accuracies in three-class emotion recognition tasks when watching live oil paintings, with a subject-dependent accuracy achieving 61.73% and a cross-subject accuracy reaching 54.12%.
17:45	Adrian Horzyk, Jakub Kosno, Daniel Bulanda and Janusz A. Starzyk Explainable Sparse Associative Self-Optimizing Neural Networks for Classification ABSTRACT. Contemporary models used for supervised training often suffer from a large number of possible combinations of hyperparameters, rigid nonadaptive architectures, underfitting, overfitting, the curse of dimensionality, etc. These issues slow down the model optimization and consume many resources to find satisfactory solutions. As long as real-world objects are related and similar, we cannot only train network parameters but also construct and automatically adapt the network structure to represent patterns and relationships. Such networks are easily explainable because they reproduce the most essential and frequent relationships of training data and aggregate representations of similarities. This paper presents a new approach to detecting and representing similarities and object relationships to self-adapt a network structure for vectorized numerical training data. By doing so, the network will facilitate the classification process by identifying hyperspace regions associated with the defined classes in a training dataset. This makes the produced models fully explainable and trustworthy. Furthermore, our approach demonstrates its ability to automatically reduce the dimensionality of input data, removing features that produce distortions without substantially supporting the classification process. The presented network adaptation algorithm produces a sparse network associative structure fitted contextually to any given dataset by detecting relationships and similarities. In addition, this algorithm does not require setting almost any hyperparameters as state-of-the-art methods usually do. The explanation of new associative adaptive approaches is followed by the comparisons of the classification results with other best-performing models and methods.

Chairs:

Yi Zou and Jinzhi Zheng