Enhanced Motor Imagery Based Brain-Computer Interface via Vibration Stimulation and Robotic Glove for Post-Stroke Rehabilitation
ABSTRACT. Motor imagery based brain-computer interface (MI-BCI) has been extensively researched as a potential intervention to enhance motor function for post-stroke patients. However, the difficulties in performing imagery tasks and the constrained spatial resolution of electroencephalography complicate the decoding of fine motor imagery (MI). To overcome the limitation, an enhanced MI-BCI rehabilitation system based on vibration stimulation and robotic glove is proposed in this paper. First, a virtual scene involving object-oriented palmar grasping and pinching actions, is designed to enhance subjects' engagement in performing MI tasks by providing straightforward and specific goals. Then, vibration stimulation, which can offer proprioceptive feedback, is introduced to help subjects better switch their attention to the corresponding MI limbs. Finally, the self-designed pneumatic manipulator control module is developed for motion execution based on the MI classification results. Seven healthy individuals were recruited to validate the feasibility of the system in improving subjects’ MI abilities. The results show that the classification accuracy of three-class fine MI can be improved to 65.67\%, which is significantly higher than the state-of-the art studies. This demonstrates the great potential of the proposed system in the application of post-stroke rehabilitation training.
A Multi-Scale and Multi-Attention Network for Skin Lesion Segmentation
ABSTRACT. Accurately segmenting the diseased areas from dermoscopy images is highly meaningful for the diagnosis of skin cancer, and in recent years, methods based on deep convolutional neural networks have become the mainstream for automatic segmentation of skin lesions. Although these methods have made significant improvements in the field of skin lesion segmentation, establishing long-range dependencies remains a major challenge for convolutional neural networks due to the limitations of the convolutional kernel size. In order to address this limitation, this paper proposes a deep learning model for skin lesion segmentation called the Multi-Scale and Multi-Attention Network (MSMA-Net). The encoder part utilizes a pretrained ResNet for feature extraction. In the skip connection part, we adopt a novel non-local method called the Fully Attentional Block (FLA), which effectively captures long-range contextual information and retains attentions in all dimensions. In the decoder part, we propose a multi-attention decoder that consists of four attention modules, allowing effective attention to be given to the feature maps in three dimensions: spatial, channel, and scale. The scale attention module adaptively fuses multi-scale information to further improve the segmentation performance. We conducted experiments on two publicly available skin lesion segmentation datasets, ISIC2017 and ISIC2018, and the results demonstrate that MSMA-Net outperforms other methods, confirming the effectiveness of MSMA-Net.
Temporal Attention for Robust Multiple Object Pose Tracking
ABSTRACT. Estimating the pose of multiple objects has improved substantially since deep learning became widely used. However, the performance deteriorated when the objects highly similar in appearance or occlusion. This issue is usually addressed by leveraging temporal information that takes previous frames as prior to improve the robustness of estimation. Existing research is either computationally expensive by using multiple frames, or inefficiently integrated with additional concepts. In this paper, we perform computationally efficient object association between two consecutive frames via attention through a video sequence. Furthermore, instead of heatmap-based approaches, we adopt coordinate classification strategy that excludes post-processing, where the network is built in an end-to-end fashion. Experiments on real data show that our approach achieves state-of-the-art results on PoseTrack datasets.
CMCI: A Robust Multimodal Fusion Method For Spiking Neural Networks
ABSTRACT. Human understand the external world through a variety of perceptual processes such as sight, sound, touch and smell. Simulating such biological multi-sensory fusion decisions using a computational model is important for both computer and neuroscience research. Spiking Neural Networks (SNNs) mimic the real neural circuits of the brain, which are expected to reveal the real multimodal perception mechanism. However, existing works of multimodal SNN are still limited, and most of them only focus on audiovisual fusion and lack systematic comparison of the performance and robustness of the models. In this paper, we propose a novel fusion module called Cross-modality Current Integration (CMCI) for multimodal SNNs and systematically compare it with other fusion methods on visual, auditory and olfactory fusion recognition tasks. Besides, a regularization technique called Modality-wise Dropout (ModDrop) is introduced to further improve the robustness of multimodal SNNs in missing modalities. Experimental results show that our method exhibits superiority in both modality-complete and missing conditions without any additional networks or parameters.
Dual-Branch Contrastive Learning for Network Representation Learning
ABSTRACT. Graph Contrastive Learning (GCL) is a self-supervised learning algorithm designed for graph data and has received widespread attention in the field of network representation learning. However, existing GCL-based network representation methods mostly use a single-branch contrastive approach, which makes it difficult to learn deeper semantic relationships and is easily affected by noisy connections during the process of obtaining global structural information embedding. Therefore, this paper proposes a network representation learning method based on a dual-branch contrastive approach. Firstly, the clustering idea is introduced into the process of embedding global structural information, and irrelevant nodes are selected and removed based on the clustering results, effectively reducing the noise in the embedding process. Then, a dual-branch contrastive method, similar to ensemble learning, is proposed, in which the two generated views are compared with the original graph separately, and the joint optimization method is used to continuously update the two views, allowing the model to learn more discriminative feature representations. The proposed method was evaluated on three datasets, Cora, Citeseer, and Pubmed, for node classification and dimensionality reduction visualization experiments. The results show that the proposed method achieved better performance compared to existing baseline models.
Multi-Granularity Contrastive Siamese Networks for Abstractive Text Summarization
ABSTRACT. Abstractive text summarization is an important task in natural language generation, which aims to compress input documents and generate concise and informative summaries. Sequence-to-Sequence (Seq2Seq) models have achieved good results in abstractive text summarization in recent years. However, such models are often sensitive to noise information in the training data and exhibit fragility in practical applications. To enhance the denoising ability of the models, we propose a Multi-Granularity Contrastive Siamese Networks for Abstractive Text Summarization. Specifically, we first perform word-level and sentence-level data augmentation on the input text and integrate the noise information of the two granularities into the input text to generate augmented text pairs with diverse noise information. Then, we jointly train the Seq2Seq model using contrastive learning to maximize the consistency between the representations of the augmented text pairs through a Siamese network. We conduct empirical experiments on the CNN/Daily Mail and XSum datasets. The results show that our model achieves higher ROUGE scores and better performance in terms of human evaluation compared to many existing benchmarks. These results validate the effectiveness of our model, which has advantages in terms of robustness and flexibility.
Robust LS-QSVM Implementation via Efficient Matrix Factorization and Eigenvalue Estimation
ABSTRACT. Traditional SVM and LSSVM face challenges in handling large datasets due to the exponential increase in computational complexity. Quantum Support Vector Machines (QSVM) based on the HHL algorithm offer a potential solution, but practical implementation in the Noisy Intermediate-Scale Quantum(NISQ) era remains limited. Our work Least Square-QSVM with Ridge Regression(LS-QSVM-RR) addresses these limitations by efficiently decomposing the LSSVM coefficient matrix and encoding it into quantum circuits via a matrix factorization algorithm. We also leverage quantum circuit properties to compute optimal ridge regression constraints by efficient eigenvalue estimation, achieving a balance between prediction accuracy and robustness in QSVM. Our contributions demonstrate the potential of Variational Quantum Algorithm(VQA)-based QSVM simulations and provide insights into handling large condition numbers in LSSVM coefficient matrices.
Joint Entity and Relation Extraction for Legal Documents based on Table Filling
ABSTRACT. Joint entity and relation extraction for legal documents is an important research task of judicial intelligence informatization, aiming at extracting structured triplets from rich unstructured legal texts. However, the existing methods for joint entity relation extraction in legal judgment documents often lack domain-specific knowledge, and are difficult to effectively solve the problem of entity overlap in legal texts. To address these issues, we propose a joint entity and relation extraction for legal documents method based on table filling. Firstly,we construct a legal dictionary with knowledge characteristics of the judicial domain based on the characteristics of judicial document data and incorporate it into a text encoding representation using a multi-head self-attention mechanism; Secondly, we transform the joint extraction task into a table-filling problem by constructing a two-dimensional table that can express the relation between word pairs for each relation separately and designing three table-filling strategies to decode the triples under the corresponding relations. The experimental results on the information extraction dataset in “CAIL2021” show that the proposed method has a significant improvement over the existing baseline model and achieves significant results in addressing the complex entity overlap problem in legal texts.
Matrix Contrastive Learning for Short Text Clustering
ABSTRACT. Text clustering is a critical task in the field of Natural Language Processing. However, traditional methods struggle with high dimension, sparse, and noisy data. Recently, many studies have combined contrastive learning with clustering to address this issue and achieved excellent clustering results. However, traditional contrastive learning methods suffer from class conflict. We propose a new framework called Matrix Contrastive Learning (MCL) for text clustering to address this issue. Firstly, data augmentation techniques are utilized to generate pairs of positive and negative instances for all anchor examples. These pairs are mapped into a feature space, where the rows of the matrix represent soft labels for individual instances, and the columns represent cluster representations. We perform contrastive learning at both the instance and cluster levels using these rows and columns. To further improve the cluster allocation in unsupervised clustering tasks and alleviate the class conflict problem caused by instance-level contrastive learning in unsupervised condition, K-Nearest Neighbors algorithm is used to filter out negative instances. We conducted extensive experiments on eight challenging text datasets and compared MCL with six existing clustering methods. The results show that MCL significantly outperforms the competing methods.
A Weakly Supervised Deep Learning Model for Alzheimer’s Disease Prognosis Using MRI and Incomplete Labels
ABSTRACT. Predicting cognitive scores using magnetic resonance imaging (MRI) can aid in the early recognition of Alzheimer's disease (AD) and provide insights into future disease progression. Existing methods typically ignore the temporal consistency of cognitive scores and discard the subjects with incomplete cognitive scores. In this paper, we propose a Weakly supervised Alzheimer’s Disease Prognosis (WADP) model that incorporates an image embedding network and a label embedding network to predict cognitive scores using baseline MRI and incomplete cognitive scores. The image embedding network is an attention consistency regularized network to project MRI into the image embedding space and output the cognitive scores at multiple time-points. The attention consistency regularization captures the correlations among time-points by encouraging the attention maps at different time-points to be similar. The label embedding network employs a denoising autoencoder to embed cognitive scores into the label embedding space and impute missing cognitive scores. This enables the utilization of subjects with incomplete cognitive scores in the training process. Moreover, a relation alignment module is incorporated to make the relationships between samples in the image embedding space consistent with those in the label embedding space. The experimental results on two ADNI datasets show that WADP outperforms the state-of-the-art methods.
TCNet: Texture and Contour-Aware Model for Bone Marrow Smear Region of Interest Selection
ABSTRACT. Bone marrow smear cell morphology is the quantitative analysis of bone marrow cell images. Due to the cell overlap and adhesion in bone marrow smears, it is essential to select uniformly distributed and clear sections as regions of interest (ROIs). However, current ROI selection models have not considered the characteristics of bone marrow smears, resulting in poor performance in practical applications. By comparing bone marrow smear ROIs and non-ROIs, we have identified significant differences in fundamental features, such as texture and contour. Therefore, we propose a texture and contour-aware bone marrow smear ROI selection model (TCNet). Inspired by multi-task learning, this model enhances its feature extraction capabilities for texture and contour by constructing different prediction modules to learn feature representations of texture and contour, and applying multi-level deep supervision with pseudo labels. To validate the effectiveness of the proposed method, we evaluate it on a self-built dataset. Experimental results show that the proposed model achieves a 2.22% improvement in classification accuracy compared to the baseline model. In addition, we verify the proposed module's generalizability by testing it on different back-bone networks, and the results demonstrate its strong universality.
MGFNet: A Multi-Granularity Feature Fusion and Mining Network for Visible-Infrared Person Re-Identification
ABSTRACT. Visible-infrared person re-identification (VI-ReID) aims to match the same pedestrian in different forms captured by the visible and infrared cameras. Existing works on retrieving pedestrians focus on mining the shared feature representations by the deep convolutional neural networks. However, there are limitations of single-granularity for identifying target pedestrians in complex VI-ReID tasks. In this study, we propose a new Multi-Granularity Feature Fusion and Mining Network (MGFNet) to fuse and mine the feature map information of the network. The network includes a Local Residual Spatial Attention (LRSA) module and a Multi-Granularity Feature Fusion and Mining (MGFM) module to jointly extract discriminative features. The LRSA module aims to guide the network to learn fine-grained features that are useful for discriminating and generating more robust feature maps. Then, the MGFM module is employed to extract and fuse pedestrian features at both global and local levels. Specifically, a new local feature fusion strategy is designed for the MGFM module to identify subtle differences between various pedestrian images. Extensive experiments on two mainstream datasets, SYSU-MM01 and RegDB, show that the MGFNet outperforms the existing techniques.
DAGAN: Generative Adversarial Network with Dual Attention-enhanced GRU for Multivariate Time Series Imputation
ABSTRACT. Missing values are common in multivariate time series data, which limits the usability of the data and impedes further analysis. Thus, it is imperative to impute missing values in time series data. Imputing missing values in time series data is imperative. Thus, in handling missing values, existing imputation techniques fail to take full advantage of the time-related data and have limitations in capturing potential correlations between variables. This paper presents a new model for imputing multivariate time series data called DAGAN, which comprises a generator and a discriminator. Specifically, the generator incorporates a Temporal Attention layer, a Relevance Attention layer, and a feature aggregation layer. The Temporal Attention layer utilizes an attention mechanism and recurrent neural network to address the RNN’s inability to model long-term dependencies in the time series. The Relevance Attention layer employs a self-attention-based network architecture to capture correlations among multiple variables in the time series. The Feature Aggregation layer integrates time information and correlation information using a residual network and a Linear layer for effective imputation of missing data. In the discriminator, we also introduce a temporal cueing matrix to aid in distinguishing between generated and real values. To evaluate the proposed model, we conduct experiments on two real-time series datasets, and the findings indicate that DAGAN outperforms state-of-the-art methods by more than 13%.
Sharpness-aware Minimization for Out-of-Distribution Generalization
ABSTRACT. Machine learning models often suffer from a significant decline in performance when they encounter out-of-distribution (OOD) data that differs from the training distribution. The distribution shift can be broadly categorized into diversity shift and correlation shift. While seeking a flat minima in optimization has been shown to improve a neural network’s generalization performance with the assumption of independent
and identical distribution (IID), it also has been shown to be an effective strategy for improving OOD generalization. However, previous studies potentially focused on addressing diversity shift, leaving the relationship between flat minima and correlation shift unresolved. To address the issue, we propose Sharpness-aware Invariant Risk Minimization (SIRM) as a novel approach to enhance generalization under correlation shift. Our method combines two parts: (1) Invariant risk minimization (IRM), which learns invariant relationships across multiple training environments, and (2) Sharpness-aware minimization (SAM), which finds a flat minima. Our analysis reveals that IRM does not guarantee flat minima and SAM does not improve the generalization in OOD. Moreover, we also analyze the relationship between flat minima and OOD data under correlation shift. Through extensive experiments conducted on image classification datasets, we demonstrate that our proposed method outperforms other methods with a competitive margin.
MTSAN-MI: Multiscale Temporal-Spatial Convolutional Self-Attention Network for Motor Imagery Classification
ABSTRACT. EEG signals are widely utilized in brain-computer interfaces, where motor imagery (MI) data plays a crucial role. The effective alignment of MI-based EEG signals for feature extraction, decoding, and classification has always been a significant challenge. Decoding methods based on convolution neural networks often encounter the issue of selecting the optimal receptive field, while convolution in the spatial domain cannot fully utilize the rich spatial topological information contained within EEG signals. In this paper, we proposed a multiscale temporal-spatial convolutional self-attention network for motor imagery classification (MTSAN-MI). The proposed model starts with a multiscale temporal-spatial convolution module, in which temporal convolutional layers of varying scales across three different branches can extract corresponding features based on their receptive fields respectively, and graph convolutional networks are better equipped to leverage the intrinsic relationships between channels. The multi-head self-attention module is directly connected to capture global dependencies within the temporal-spatial features. Evaluation experiments are conducted on two four-class MI EEG datasets, which show that the state-of-the-art is achieved on one dataset, and the result is comparable to the best method on the other dataset. The ablation study also proves the importance of each component of the framework.
Knowledge-Distillation-Warm-Start Training Strategy for Lightweight Super-Resolution Networks
ABSTRACT. In recent years, studies on lightweight networks have made rapid progress in the field of image Super-Resolution (SR). Although the lightweight SR network is computationally efficient and saves parameters, the simplification of the structure inevitably leads to limitations in its performance. To further enhance the efficacy of lightweight networks, we propose a Knowledge-Distillation-Warm-Start (KDWS) training strategy. This strategy enables further optimization of lightweight networks using dark knowledge from traditional large-scale SR networks during warm-start training and can empirically improve the performance of lightweight models. For experiment, we have chosen several traditional large-scale SR networks and lightweight networks as teacher and student networks, respectively. The student network is initially trained with a conventional warm-start strategy, followed by additional supervision from the teacher network for further warm-start training. The evaluation on common test datasets shows that our proposed training strategy can result in better performance for a lightweight SR network. Furthermore, our proposed approach can also be adopted in any deep learning network training process, not only image SR tasks, as it is not limited by network structure or task type.
Predefined-Time Event-Triggered Consensus for Nonlinear Multi-Agent Systems with Uncertain Parameter
ABSTRACT. In this paper, a novel predefined-time event-triggered control method is proposed, which achieved to the consistency of multi-agent systems with uncertain parameter. Firstly, a new predefined-time stability theorem is given, and the correctness and feasibility of this stability theorem are analyzed, the flexible preset time is more practical than the existed stability theorem. Compared with existing stability theorems, this theorem simplifies the conditions satisfied by Lyapunov function and is easier to implement in practical applications. Secondly, an event-triggered control strategy is designed to reduce control costs. Then, a new sufficient criterion is given to achieve the consistency of multi-agent systems with uncertain parameter based on the predefined-time stability theorem and event-triggered controller. In addition, the state consensus between nonlinear agents is completed in a predefined time, as well as the measurement error of the agent is converges to zero within the predefined time, respectively. Finally, the validity and feasibility of the given theoretical results are verified by a simulation example.
Multi-model smart contract vulnerability detection based on BiGRU
ABSTRACT. Smart contracts have been under constant attack from outside, with frequent security problems causing great economic losses to the virtual currency mar-ket, and their security research has attracted much attention in the academic community. Traditional smart contract detection methods rely heavily on ex-pert rules, resulting in low detection precision and efficiency. In this paper, we explore the effectiveness of deep learning methods on smart contract de-tection and propose the S-BiGRU multi-model smart contract detection method, which is based on a multi-model vulnerability detection method combining Bi-directional Gated Recurrent Unit (BiGRU) and Synthetic Mi-nority Oversampling Technique (SMOTE) for smart contract vulnerability detection. Through a comparative study on the vulnerability detection of 10312 smart contract codes, the method can achieve an identification accu-racy of 90.17% and a recall rate of 97.7%. Compared with other deep net-work models, the S-BiGRU model has superior performance in terms of re-call and accuracy.
An Alignment and Matching Network with Hierarchical Visual Features for Multimodal Named Entity and Relation Extraction
ABSTRACT. In this paper, we study the tasks of multimodal named entity recognition (MNER) and multimodal relation extraction (MRE), both of which involve incorporating visual modality to complement text modality. The core issues are how to bridge the modality gap and reduce modality noise. To address the first issue, we introduce an image-text alignment (ITA) module to obtain a better unimodal representation by aligning the inconsistent representations between image and text, which come from different encoders. To tackle the second issue, we propose an image-text matching (ITM) module that constructs hard negatives to improve the model's ability to capture the semantic correspondence between text and image. Besides, we also selectively combine and concatenate the hierarchical visual features obtained from both global and visual ob-jects of Vision Transformer (ViT) as improved visual prefix for modality fusion. We conduct extensive experiments to demonstrate the effectiveness of our method (AMNet ) and achieve state-of-the-art performance on three benchmark datasets.
Temporal Modeling Approach for Video Action Recognition Based on Vision-Language Models
ABSTRACT. The usage of large-scale vision-language pre-training models plays an important role in reducing computational consumption and improving the accuracy of the video action recognition task. However, pre-training models trained by image data may ignore temporal information which is significant for video tasks. In this paper, we introduce a temporal modeling approach for the action recognition task based on large-scale pre-training models. We make the model capture the temporal information contained in frames by modeling the short-time local temporal information and the long-time global temporal information in videos separately. We introduce a multi-scale difference approach to getting the difference between adjacent frames, and employ a cross-frame attention approach to capturing semantic differences and details of temporal changes. In addition, we use residual attention blocks to implement the temporal Transformer and assign individual importance scores to each frame by computing the similarity of the frame to the clustering center, to obtain the overall temporal information of the video. Our model achieves 82.3% accuracy on the Kinetics400 dataset with just eight frames. Furthermore, zero-shot results on the HMDB51 dataset and UCF101 dataset demonstrate the strong transferability of our model.
Dual-Enhancement Model of Entity Pronouns and Evidence Sentence for Document-level Relation Extraction
ABSTRACT. Document-level relation extraction (DocRE) aims to identify all relations between entities in different sentences within a document. Most works are committed to achieving more accurate relation prediction by optimizing model structure. However, the usage of entity pronoun information and extracting evidence sentences are limited by incomplete manual annotation data. In this paper, we propose a Dual-enhancement model of entity pronouns and evidence sentences (DeepENT), which efficiently leverages pronoun information and effectively extracts evidence sentences to improve DocRE. First, we design an Entity Pronouns Enhancement Module, which achieves co-reference resolution and automatic data fusion to enhance the completeness of entity information. Then, we define two types of evidence sentences and design heuristic rules to extract them, used in obtaining sentence-aware context embedding. In this way, we can logically utilize complete and accurate evidence sentence information. Experimental results reveal that our approach performs excellently on the Re-DocRED benchmark, especially in predicting inter-sentence expression relations.
Hi-Stega : A Hierarchical Linguistic Steganography Framework Combining Retrieval and Generation
ABSTRACT. Due to the widespread use of social media, linguistic steganography which embeds secret message into normal text to protect the security and privacy of secret message, has been widely studied and applied. However, existing linguistic steganography methods ignore the correlation between social network texts, resulting in steganographic texts that are isolated units and prone to breakdowns in cognitive-imperceptibility. Moreover, the embedding capacity of text is also limited due to the fragmented nature of social network text.
In this paper, in order to make the practical application of linguistic steganography in social network environment, we design a hierarchical linguistic steganography (Hi-Stega) framework. Combining the benefits of retrieval and generation steganography method, we divide the secret message into data information and control information by taking advantage of the fact that social network contexts are associative. The data information is obtained by retrieving the secret message in normal network text corpus and the control information is embedded in the process of comment or reply text generation. The experimental results demonstrate that the proposed approach achieves higher embedding payload while the imperceptibility and security can also be guaranteed. (All datasets and codes used in this paper are released at https://github.com/wanghl21/Hi-Stega.)
Nearest Memory Augmented Feature Reconstruction for Unified Anomaly Detection
ABSTRACT. Reconstruction-based anomaly detection methods expect to reconstruct normality well but fail for abnormality. Memory modules have been exploited to avoid reconstructing anomalies, but they may overgeneralize by using memory in a weighted manner. Additionally, existing methods often require separate models for different objects. In this work, we propose nearest memory augmented feature reconstruction for unified anomaly detection. Specifically, the novel nearest memory addressing (NMA) module enables memory items to record normal prototypical patterns individually. In this way, the risk of over-generalization is mitigated while the capacity of the memory item is fully exploited. To overcome the constraint of training caused by NMA that has no real gradient defined, we perform end-to-end training with straight-through gradient estimation and exponential moving average. Moreover, we introduce the feature reconstruction paradigm to avoid the reconstruction challenge in the image space caused by information loss of the memory mechanism. As a result, our method can unify anomaly detection for multiple categories. Extensive experiments show that our method achieves state-of-the-art performance on both MVTecAD and BTAD datasets under the unified setting. Remarkably, it achieves comparable or better performance than other algorithms under the separate setting.
Deep Learning Based Personalized Stock Recommender System
ABSTRACT. This research paper introduces a personalized recommender system tailored specifically for the stock market. With the increasing complexity and variety of investment options, individual investors face significant challenges in making informed decisions. Traditional stock market recommendations often offer generic advice that fails to account for investors' unique preferences and risk appetites.
We propose a personalized recommender system that utilizes deep learning techniques to provide customized stock recommendations. Our approach combines collaborative filtering (CF) and content-based (CB) filtering methodologies which form a hybrid system capable of generating personalized recommendations. Collaborative filtering utilizes the behaviour of similar investors to identify stocks that are likely to appeal to the user, while content-based filtering matches stock characteristics with the user's preferences and investment history.
Experimental evaluations demonstrate that the proposed personalized recommender system outperforms existing algorithms and approaches trained on user interaction data taken from the stock market domain, providing investors with tailored stock recommendations that align with their personal needs and preferences.
ECOST: Enhanced CoST Framework for Fast and Accurate Time Series Forecasting
ABSTRACT. We introduce an enhanced forecasting framework for time series using contrastive learning. This method builds on the existing CoST (Contrastive Learning of Disentangled Seasonal-Trend Representations) framework, which, despite its promising performance, is still encumbered by the challenges posed by the intricate nature of time series data. In our proposed framework, we bypass the need for a backbone encoder and directly perform time series decomposition to extract the trend and detrended subseries. These components subsequently undergo independent trend and seasonal feature extraction. This approach ensures a more robust, efficient, and direct representation of inherent time series characteristics. We incorporate Reversible Instance Normalization (RevIN) to improve forecasting accuracy and account for potential distribution bias. Additionally, a new concept, the 'trend queue', is proposed for storing past trend features, improving the learning of trend nuances. Our ECoST model has shown significant improvements, with an increase in prediction accuracy by 8.5\% and a 74\% enhancement in training time efficiency compared to the CoST model. These results were validated through experiments conducted on several real-world time series datasets. This underscores the effectiveness of our approach in providing a more robust and efficient time series forecasting methodology, thereby setting a new benchmark in the field of contrastive learning for time series data.
Adaptive CNN-Based Image Compression Model for Improved Remote Desktop Experience
ABSTRACT. This paper focuses on enhancing the presentation of desktop images on the client-side within remote desktop application scenarios. Remote desktop is a widely adopted tool that improves work efficiency. To reduce the network bandwidth required for transmitting real-time desktop images in remote desktop, image compression technology is extensively utilized. JPEG, a versatile and efficient image compression algorithm based on discrete cosine transform, efficiently eliminates redundancy in areas of high-frequency color changes, which are generally insensitive to the human eye, while preserving most important information in the image. In remote desktop application scenarios, JPEG compression algorithm is commonly used to compress real-time desktop images. However, there are some problems when compressing desktop images using JPEG: as the compression ratio increases, blocking artifacts and ghosting may appear on the reconstructed desktop image, adversely impacting the viewing experience of remote desktop users. In recent years, deep learning-based image compression technology has been gradually applied, and its efficiency and performance have progressively approximated and even surpassed traditional image compression algorithms such as JPEG and BGP. This paper proposes an end-to-end image compression and reconstruction model based on convolutional neural networks, and presents an image compression encoding and decoding method optimized for the subjective perception of the human eye. This method is combined with adaptive spatial and channel attention mechanisms, enabling a more complete preservation of text and texture information in the reconstructed image. Compared with JPEG and some other algorithms based on deep learning, the method proposed in this paper offers superior image perception and a higher compression ratio.
Retinex Meets Transformer: Bridging Illumination and Reflectance Maps for Low-light Image Enhancement
ABSTRACT. Low-light image enhancement, which is also known as LLIE for short, aims to reconstruct the original normal image from its low-illumination counterpart. Recently, it has received increasingly attention in image restoration. In particular, with the success of deep convolutional neural network (CNN), Retinex-based approaches have emerged as a promising line of research in LLIE, since they can well transfer adequate prior knowledge from an image captured under sufficient illumination to its low-light version for image enhancement. However, existing Retinex-based approaches usually overlook the correlation between Illumination and Reflectance maps which are both derived from the same feature extractor, leading to sub-optimal reconstructed image quality. In this study, we propose a novel Transformer architecture for LLIE, termed Bridging Illumination and Reflectance maps Transformer which is shortly BIRT. It aims to estimate the correlation between Illumination and Reflectance maps derived from Retinex decomposition within a Transformer architecture via the Multi-Head Self-Attention mechanism. In terms of model structure, the proposed BIRT comprises Retinex-based and Transformer-based sub-networks, which allow our model to elevate the image quality by learning cross-feature dependencies and long-range details between Illumination and Reflectance maps. Experimental results demonstrate that the proposed BIRT model achieves competitive performance on par with the state-of-the-arts on the public benchmarking datasets for LLIE.
Make Spoken Document Readable: Leveraging Graph Attention Networks for Chinese Document-Level Spoken-to-Written Simplification
ABSTRACT. As people use language differently when speaking compared to writing, transcriptions generated by automatic speech recognition systems can be difficult to read. While techniques exist to simplify spoken language into written language at the sentence level, research on simplifying spoken language at the document level is limited. Document-level spoken-to-written simplification faces challenges posed by cross-sentence transformations and the long dependencies of spoken documents. This paper proposes a new method called G-DSWS (\textbf{G}raph attention networks for \textbf{D}ocument-level \textbf{S}poken-to-\textbf{W}ritten \textbf{S}implification) using graph attention networks to model the structure of a document explicitly. G-DSWS utilizes structural information from the document to improve the document modeling capability of the encoder-decoder architecture. Experiments on the internal and publicly available datasets demonstrate the effectiveness of the proposed model. And the human evaluation and case study show that G-DSWS indeed improves spoken Chinese documents' readability.
LCformer:Linear Convolutional Decomposed Transformer for Long-Term Series Forecasting
ABSTRACT. Transformer-based methods have shown excellent results in long-term series forecasting, but they still suffer from high time and space costs; difficulties in analyzing sequence correlation due to entanglement of the original sequence; bottleneck in information utilization due to the dot-product pattern of the attention mechanism. To address these problems, we propose a sequence decomposition architecture to identify the different features of sub-series decomposed from the original time series. We then utilize causal convolution to solve the information bottleneck problem caused by the attention mechanism's dot-product pattern. To further improve the efficiency of the model in handling long-term series forecasting, we propose the Linear Convolution Transformer (LCformer) based on a linear self-attention mechanism with O(n) complexity, which exhibits superior prediction performance and lower consumption on long-term series prediction problems. Experimental results on two different types of benchmark datasets show that the LCformer exhibits better prediction performance compared to those of the state-of-the-art Transformer-based methods and exhibits near linear complexity for long series prediction.
Multi-Level Augmentation Boosts Hybrid CNN-Transformer Model for Semi-Supervised Cardiac MRI Segmentation
ABSTRACT. Over the past few years, many supervised deep learning al-
gorithms based on Convolutional Neural Networks (CNN) and Vision
Transformers (ViT) have achieved remarkable progress in the field of
clinical-assisted diagnosis. However, the specific application of these al-
gorithms e.g. ViT which requires a large amount of data in the training
process is greatly limited due to the high cost of medical image an-
notation. To address this issue, this paper proposes an effective semi-
supervised medical image segmentation framework, which combines two
models with different structures, i.e. CNN and Transformer, and inte-
grates their abilities to extract local and global information through a
mutual supervision strategy. Based on this heterogeneous dual-network
model, we employ multi-level image augmentation to expand the dataset,
alleviating the model’s demand for data. Additionally, we introduce an
uncertainty minimization constraint to further improve the model’s ro-
bustness, and incorporate an equivariance regularization module to en-
courage the model to capture semantic information of different categories
in the images. In public benchmark tests, we demonstrate that the pro-
posed method outperforms the recently developed semi-supervised medi-
cal image segmentation methods in terms of specific metrics such as Dice
coefficient and 95% Hausdorff Distance for segmentation performance.
Fashion Trend Forecasting Based on Multivariate Attention Fusion
ABSTRACT. Garment fashion trend prediction is a common yet important issue in the fashion industry. Its goal is to capture the trend changes of different garment attributes and achieve predictions of their future popularity. The key to successful prediction lies in identifying the underlying external factors that influence fashion changes. Existing garment fashion trend prediction models do not fully integrate comments and user preferences from social media platforms, which results in the neglect of relevant external factors and affects the accuracy of fashion popularity prediction. In response to this problem, this paper proposes a garment fashion trend prediction model based on multivariate attention fusion. Firstly, it combines a large amount of diverse information posted by users on fashion platforms like Chictopia. Secondly, it employs the GLU module for nonlinear mapping of static covariates that do not vary over time, suppressing irrelevant information while extracting features. Additionally, to enhance the extraction of contextual features from sequential data, extended causal convolutions are used to extract features from the dynamic fashion style trend sequences. Subsequently, a multivariate cross-attention block is designed to capture the mapping relationship between dynamic and static variables in the input, achieving feature fusion. Finally, a multivariate attention fusion Transformer model is used to predict fashion popularity. Experimental results demonstrate that this method accurately predicts future trends on the SFS, FIT, and Geo datasets, with improvements of 8.79% and 11.77% in MAE and MAPE evaluation metrics, respectively, compared to the best existing fashion trend prediction models.
CATS: Connection-aware and Interaction-based Text Steganalysis in Social Networks
ABSTRACT. The generative linguistic steganography in social networks have potential huge abuse and regulatory risks, with serious implications for information security, especially in the era of large language models.
Many works have explored detecting steganographic texts with progressively enhanced imperceptibility, but they can only achieve poor performance in real social network scenarios.
One key reason is that these methods primarily focus on linguistic features, which are extremely insufficient owing to the fragmentation of social texts.
In this paper, we propose a novel method called CATS (Connection-aware and interAction-based Text Steganalysis) to effectively detected the potentially malicious steganographic texts. CATS captures social networks connection information by graph representation learning, enhances linguistic features by contrastive learning and fully integrates features above via a novel features interaction module. Our experimental results demonstrate that CATS outperforms existing methods by exploiting social network graph structure features and interactions in social network environments.
ABSTRACT. Image compression is indispensable in many visual applications. Recently, learned image compression (LIC) using deep learning has surpassed traditional image codecs such as JPEG in terms of compression efficiency but at the cost of increased complexity. Thus, employing LIC in resource-limited environments is challenging. In this paper, we propose an LIC model using a look-up table (LUT) to effectively reduce the complexity. Specifically, we design an LUT replacing the entropy decoder to an by analyzing its input characteristics and accordingly developing a dynamic sampling method for determining the indices of the LUT. Experimental results show that the proposed method achieves better compression efficiency than traditional codecs with faster runtime than LIC models.
AttnOD: An Attention-based OD Prediction Model with Adaptive Graph Convolution
ABSTRACT. In recent years, with the continuous growth of traffic scale, the prediction of passenger demand has become an important problem. However, most of the previous methods only considered the passenger flow in a region or at one point, which cannot effectively model the detailed demands from origins to destinations. Differently, this paper focuses on a challenging yet worthwhile task called Origin-Destination (OD) prediction, which aims to predict the traffic demand between each pair of regions in the future. In this regard, an Attention-based OD prediction model with adaptive graph convolution (AttnOD) is designed. Specifically, the model follows an Encoder-Decoder structure, which aims to encode historical input as hidden states and decode them into future prediction. Among each block in the encoder and the decoder, adaptive graph convolution is used to capture spatial dependencies, and self-attention mechanism is used to capture temporal dependencies. In addition, a cross attention module is designed to reduce cumulative propagation error for prediction. Through comparative experiments on the Beijing subway and New York taxi datasets, it is proved that the AttnOD model can obtain better performance than the baselines under most evaluation indicators. Furthermore, through the ablation experiments, the effect of each module is also verified.
CMMix: Cross-Modal Mix Augmentation between Images and Texts for Visual Grounding
ABSTRACT. Visual grounding (VG) is a representative multi-modal task that has recently gained increasing attention. Nevertheless, existing works still face challenges leading to under-performance due to insufficient training data. To address this, some researchers have attempted to generate new samples by integrating each two (image, text) pairs, inspired by the success of uni-modal CutMix series data augmentation. However, these methods mix images and texts separately and neglect their contextual correspondence. To overcome this limitation, we propose a novel data augmentation method for visual grounding task, called Cross-Modal Mix (CMMix). Our approach employs a fine-grained mix paradigm, where sentence-structure analysis is used to locate the central noun parts in texts, and their corresponding image patches are drafted through noun-specific bounding boxes in VG. In this way, CMMix maintains matching correspondence during mix operation, thereby retaining the coherent relationship between images and texts and resulting in richer and more meaningful mixed samples. Furthermore, we employ a filtering-sample-by-loss strategy to enhance the effectiveness of our method. Through experiments on four VG benchmarks: ReferItGame, RefCOCO, RefCOCO+, and RefCOCOg, the superiority of our method is fully verified.
A Relation-oriented Approach for Complex Entity Relation Extraction
ABSTRACT. Entity relation extraction targets the extraction of structured triples from unstructured text, and is the start of the entire knowledge graph lifecycle. In recent advances, Machine Reading Comprehension (MRC) based approaches provide new paradigms for entity relationship extraction and achieve the state-of-the-art performance. Aiming at the features of nested entities, over-lapping relationships and distant entities in recipe texts, this study proposes a relation-oriented approach for complex entity relation extraction. This approach addresses the entity redundancy problem caused by the traditional pipeline models which are entity-first methods. By predicting the starting and ending of entities, it solves the problem of nested entities that cannot be identified by traditional sequence labeling methods. Finally, the error propagation issue is mitigated by the triple determination module. We conduct ex-tensive experiments on multi-datasets in both English and Chinese, and the experimental results show that our method significantly outperforms the baseline model in terms of both precision, recall and micro-F1 score.
A Revamped Sparse Index Tracker leveraging $K$--\,Sparsity and Reduced Portfolio Reshuffling
ABSTRACT. In finance, sparse index tracking (SIT) is a specialized, low-cost, and effective passive strategy that seeks to replicate a financial index using a representative subset of its constituents. However, the existing IT algorithms have two imperfections: (1) investors cannot explicitly control the number of holding assets, and (2) excess purchases and sales may occur during rebalancing. Owing to these deficiencies, this paper first proposes a more practical formulation (a constrained optimization problem). Afterwards, the paper develops the corresponding algorithm, called index tracker with four portfolio constraints via projected gradient descent (IT4-PGD), for solving the constrained optimization problem. With IT4-PGD, investors can freely define the settings of the portfolio, including the number of constituents and turnover on each constituent. Simulation results show that IT4-PGD outperforms existing methods in terms of the magnitude of the daily tracking error (MDTE) and the accumulative turnover ratio (ATR).
Anomaly detection of fixed-wing unmanned aerial vehicle (UAV) based on cross-feature-attention LSTM network
ABSTRACT. With the gradual penetration of unmanned aerial vehicle(UAV) technology and its related applications in people’s lives, the safety of UAVs has become an important research focus. In this paper, we present an anomaly detection method based on cross-feature-attention LSTM neural networks. In an unsupervised setting, we use two types of networks to extract temporal and spatial features from flight data and predict future states to detect abnormal flight behavior. We conduct experiments on real flight data, the Air Lab Fault and Anomaly (ALFA) dataset, using multiple sets of different feature combinations. The results indicate that our method can maintain high performance across different feature combinations, achieving an average accuracy of 0.96 and a response time of 5.01 seconds.
Q-Learning Based Adaptive Scheduling Method For Hospital Outpatient Clinics
ABSTRACT. Proper selection of the number of Service Providers (SPs) such as doctors, registration windows, and examination equipment in outpatient clinics can improve the efficiency of services and promote the sharing and effective use of medical resources. In this paper an adaptive scheduling model for hospital outpatient clinics on the number of SPs while minimizing total cost is proposed. Firstly, The M/G/K model of the outpatient queuing process is constructed based on queuing theory, where M denotes that the Poisson process of patient arrivals, the service time follows the general distribution defined as G, and K is the number of SPs. Secondly, the objective function of minimizing costs such as waiting, setup, SP usage is established. The optimal number of SP is solved with the Q-learning algorithm in reinforcement learning (RL). Finally, the simulation verifies that as the cost of service gradually increases, the system will favor fewer SPs to perform the service to reduce the total cost. This scheduling model can not only adjust the scheduling scheme according to the different service costs to maximize the economic efficiency of the hospital, but also can be used to plan the hospital staffing.
Temporal Task Graph based Dynamic Agent Allocation for Applying Behavior Trees in Multi-agent Games
ABSTRACT. Behavior Trees (BTs) have gained widespread applications across diverse domains, facilitating the decomposition of complex tasks into manageable subtasks. However, an inherent challenge in maximizing the performance of BTs lies in dynamically allocating agents to subtasks as time progresses. This allocation predicament is compounded by the intricate nature of game states and the temporal variations in subtask activation. In this paper, we propose a novel approach that combines temporal task graphs with reinforcement learning to dynamically allocate agents among subtasks in BT. We employ a temporal task graph to model the dynamic activation of subtasks, where encoded vectors are multiplied by the agent's encoded observation. This enables each agent to be assigned to a specific subtask while considering comprehensive information about all subtasks. Moreover, we aggregate the Q-values of selected subtasks for all agents, leveraging this information to compute a total loss for updating the entire network. To evaluate the efficacy of our approach, we conducted extensive experiments on the challenging benchmark provided by Google Research Football. The results clearly demonstrate a significant performance improvement in BTs when leveraging our proposed framework.
CKR-Calibrator: Convolution Kernel Robustness Evaluation and Calibration
ABSTRACT. Recently, Convolution Neural Networks (CNN) have achieved excellent performance in some areas of computer vision, including face recognition, character recognition, and autonomous driving. However, there are still many CNN-based models that cannot be deployed in real-world scenarios due to poor robustness. In this paper, focusing on the classification task, we attempt to evaluate and optimize the robustness of CNN-based models from a new perspective: the convolution kernel. Inspired by the discovery that the root cause of the model decision error lies in the wrong response of the convolution kernel, we propose a convolution kernel robustness evaluation metric based on the distribution of convolution kernel responses. Then, we devise the Convolution Kernel Robustness Calibrator, termed as CKR-Calibrator, to optimize key but not robust convolution kernels. Extensive experiments demonstrate that CKR-Calibrator improves the accuracy of existing CNN classifiers by 1%-4% in clean datasets and 1%-5% in corrupt datasets, and improves the accuracy by about 2% over SOTA methods. The evaluation and calibration source code is open-sourced at https://github.com/cym-heu/CKR-Calibrator.
A Causality-Based Interpretable Cognitive Diagnosis Model
ABSTRACT. The primary objective of cognitive diagnosis is to evaluate students' cognitive states and processes during learning,enabling teachers and researchers to gain a deeper understanding of students' learning needs and challenges, and offer personalized learning support. Nevertheless, deep learning-based cognitive diagnosis models are inherently opaque, posing challenges in providing psychological insights into the reasoning behind predicted outcomes. This lack of transparency hampers researchers' analysis of students' learning situations. We employ feature engineering to derive three meaningful and interpretable parameters, i.e, skill mastery, exercise difficulty, and exercise discrimination. Drawing inspiration from the Bayesian classification network, the proposed approach employs a Tree-Augmented Naive Bayes Classifier to predict the performance of students'. This approach meets the interpretability criteria while enhancing predictive accuracy. In the experiment, we evaluate the effectiveness of the proposed approach by comparing it with classical and state-of-the-art cognitive diagnostic models on four open benchmark datasets.We conduct ablation studies on each feature to examine their contribution to student performance prediction. Thus, CBICDM has great potential for providing adaptive and personalized instructions with causal reasoning in real-world educational systems.
Spatial and Frequency Domains Inconsistency Learning for Face Forgery Detection
ABSTRACT. With the rapid development of face forgery technology, it has attracted widespread attention. The current face forgery detection methods, whether based on Convolutional Neural Network (CNN) or Vision Transformer (ViT), are biased towards extracting local or global features respectively. These methods are relatively one-sided, and the extracted features are not robust and general enough. In this work, we exploit intra-frame inconsistency as well as inter-modal inconsistency between spatial and frequency domains to improve performance and generalization for face forgery detection. We efficiently extract intra-frame inconsistency by utilizing the capabilities of Swin Transformer. Its self-attention mechanism and attention mapping between patch embeddings naturally represent the inconsistency relations, allowing for simultaneous modeling of both local and global features, making it our ideal choice. Meanwhile, we also introduce frequency information to further improve detection performance, and design a Cross-Attention Feature Fusion (CAFF) module to exploit the inconsistency between spatial and frequency modalities to extract more general feature representations. Extensive experiments demonstrate the effectiveness of the proposed method.
ABSTRACT. The present weakly supervised methods for Temporal Action Localization are primarily responsible for capturing the temporal context. However, these approaches have limitations in capturing semantic context, resulting in the risk of ignoring snippets that are far apart but share the same action categories. To address this issue, we propose an action label propagation network utilizing sparse graph networks to effectively explore both temporal and semantic information in videos. The proposed SGLP-Net comprises two key components. One is the multi-scale temporal feature embedding module, a novel method that extracts both local and global temporal features of the videos during the initial stage using CNN and self-attention and serves as a generic module. The other is an action label propagation mechanism, which uses graph networks for feature aggregation and label propagation. To avoid the issue of excessive feature completeness, we optimize training using sparse graph convolutions. Extensive experiments are conducted on THUMOS14 and ActivityNet1.3 benchmarks, among which advanced results demonstrate the superiority of the proposed method. Our code and model will be made available to the public after acceptance.
RoBrain: Towards Robust Brain-to-Image Reconstruction via Cross-Domain Contrastive Learning
ABSTRACT. With the development of neuroimaging technology and deep learning methods, neural decoding with functional Magnetic Resonance Imaging (fMRI) of human brain has attracted more and more attention. Neural reconstruction task, which intends to reconstruct stimulus images from fMRI, is one of the most challenging tasks in neural decoding. Due to the instability of neural signals, trials of fMRI collected under the same stimulus prove to be very different, which leads to the poor robustness and generalization ability of the existing models. In this work, we propose a robust brain-to-image model based on cross-domain contrastive learning. With deep neural network (DNN) features as paradigms, our model can extract features of stimulus stably and generate reconstructed images via DCGAN. Experiments on the benchmark Deep Image Reconstruction dataset show that our method can enhance the robustness of reconstruction significantly.
A Feature Pyramid Fusion Network Based on Dynamic Perception Transformer for Retinal OCT Biomarker Image Segmentation
ABSTRACT. Quantitative analysis of biomarkers in Optical Coherence Tomography (OCT) images plays an import role in the diagnosis and treatment of retinal diseases. However, biomarkers segmentation in retinal OCT images is still very hard as a result of the large variations in size and shape of retinal biomarkers, blurred boundaries, low contrast, and speckle noise interference. This paper proposes a novel Multi-scale Local-Global Transformer network (MsLGT-Net) for bi-omarkers segmentation in retinal OCT images. The network combines the pro-posed Multi-scale Fusion Attention (MFA) module and Local-Global Trans-former (LGT) module to tackle the challenges of biomarker segmentation. Specif-ically, the proposed MFA module aims to enhance the network’s ability to learn multi-scale features of retinal biomarkers by effectively combining the local detail information and contextual semantic information of biomarkers at different scales, and improve the representation ability for different classes of biomarkers. The LGT module is designed to learn local and global information adaptively from multi-scale fused features to meet the challenges of changing biomarker scales. Our proposed method is validated on one local dataset. The experimental results show that the method is more effective than other fully supervised methods.
Enhancing Camera Position Estimation by Multi-View Pure Rotation Recognition and Automated Annotation Learning
ABSTRACT. Pure rotational anomaly recognition is a critical problem in 3D visual computation, requiring precise recognition for reliable camera pose estimation and robust 3D reconstruction. Current techniques primarily focus on model selection, parallax angle, and intersection constraints within two-view geometric models when identifying pure rotational motion. This paper proposes a multi-view pure rotational detection method that draws upon two-view rotation-only recognition indicators to identify pure rotational views that cause pose estimation anomalies. An automatic data annotation and training strategy for rotationonly anomaly recognition in multi-view pose estimation data is also introduced. Our experiments demonstrate that our proposed model for rotation-only anomaly recognition achieves an accuracy of 91% on the test set and is highly effective in improving the precision, resilience, and performance of camera pose estimation, 3D reconstruction, SLAM, SfM, object tracking, and other computer vision tasks. The effectiveness of our approach is validated through comparison with related approaches
in a simulated camera motion trajectory experiment.
VFIQ: A Novel Model of ViT-FSIMc Hybrid Siamese Network for Image Quality Assessment
ABSTRACT. The goal of Image Quality Assessment (IQA) is to measure how humans perceive the quality of images. In this paper, we propose a new model named for VFIQ -- a ViT-FSIMc Hybrid Siamese Network for Full Reference IQA -- that combines signal processing and leaning-based approaches, the two categories of IQA algorithms. Specifically, we design a hybrid Siamese network that leverages the Vision Transformer (ViT) and the feature similarity index measurement (FSIMc). To evaluate the performance of the proposed VFIQ model, we first pre-train the ViT module on the PIPAL dataset, and then evaluate our VFIQ model on several popular benchmark datasets including TID2008, TID2013, and LIVE. The experiment results show that our VFIQ model outperforms the state-of-the-art IQA models in the commonly used correlation metrics of PLCC, KRCC, and SRCC. We also demonstrate the usefulness of our VFIQ model in different vision tasks, such as image recovery and generative model evaluation.
Spiking Reinforcement Learning for Weakly-supervised Anomaly Detection
ABSTRACT. Weakly-supervised Anomaly Detection (AD) has achieved significant performance improvement compared to unsupervised methods by harnessing very little additional labeling information. However, most existing methods ignore anomalies in unlabeled data by simply treating the whole unlabeled set as normal; that is, they fail to resist such noise that may considerably disturb the learning process, and more importantly, they cannot extract key anomaly features from these unlabeled anomalies, which are complementary to those labeled ones. To solve this problem, A spiking reinforcement learning framework for weakly-supervised AD is proposed, named ADSD. Compared with artificial neural networks, the spiking neural network can effectively resist input perturbations due to its unique coding methods and neuronal characteristics. From this point of view, by using spiking neurons with noise filtering and threshold adaptation, as well as a multi-weight evaluation method to discover the most suspicious anomalies in unlabeled data, ADSD achieves end-to-end optimization for the utilization of a few labeled anomaly data and rare unlabeled anomalies in complex environments. The agent in ADSD has robustness and adaptability when exploring potential anomalies in the unknown space. Extensive experiments show that our method ADSD significantly outperforms four popular baselines in various environments while maintaining good robustness and generalization performance.
Resource-aware DNN Partitioning for Privacy-sensitive Edge-Cloud Systems
ABSTRACT. With recent advances in deep neural networks (DNNs), there has been a significant increase in sophisticated robotic and IoT applications leveraging AI with edge-cloud infrastructures. However, deploying large DNN models on typically resource-constrained edge devices is challenging due to their computational, power, and application-specific privacy limitations. This work focuses on facilitating the secure training and inference of advanced computer vision models on edge devices by offloading intensive computational burden while preserving data privacy. This is achieved by integrating privacy-preserving techniques with model partitioning in the edge computing paradigm. While existing approaches have deployed a partial DNN on an edge device while processing the remaining portion of the DNN on the cloud, these works focus mainly on communication and power efficiency. DNN partitioning based on an edge device’s privacy requirements and computational resources has not been widely explored in the literature. In this paper, we propose awareSL, a model partitioning framework that partitions a DNN based on available edge computational resources, preserving the privacy of input samples while maintaining high accuracy. Our empirical evaluation of four popular DNN architectures illustrate that awareSL significantly reduces memory usage and computation costs without sacrificing accuracy. We also demonstrate the feasibility and privacy-preserving capability of awareSL against well-known privacy attacks under realistic settings.
Link Prediction Based on the Sub-graphs Learning with Fused Features
ABSTRACT. As one of the important research methods in the area of the knowledge graph completion, link prediction aims to capture the structural
information or the attribute information of nodes in the network to predict the
link probability between nodes,In particular, graph neural networks based on
the sub-graphs provide a popular approach for the learning representation to the
link prediction tasks. However, they cannot solve the resource consumption in
large graphs, nor do they combine global structural features since they often
simply stitch attribute features and embedding to predict.Therefore, this paper proposes a novel link prediction model based on the Sub-graphs Learning with
the Fused Features ,named SLFF in short. In particular, the proposed model utilizes he random walks to extract the sub-graphs to reduce the overhead in the process.Moreover, it utilizes the Node2Vec to process the entire graph and
obtain the global structure characteristics of the node. Afterward,the SLFF model utilizes the existing embedding to reconstruct the embedding according
to the neighborhood defined by the graph structure and node attribute space. Finally,he SLFF model can combine the attribute characteristics of the node with the structural characteristics of the node together. The extensive
experiments on datasets demonstrates that the proposed SLFF has better performance than that of the state-of-the-art approaches.
Leveraging Hierarchical Similarities for Contrastive Clustering
ABSTRACT. Recently, contrastive clustering has demonstrated high performance in the field of deep clustering due to its powerful feature extraction capabilities. However, existing contrastive clustering methods suffer from inter-class conflicts and often produce suboptimal clustering outcomes due to the disregard of latent class information.
To address this issue, we propose a novel method called Contrastive learning using Hierarchical data similarities for Deep Clustering (CHDC), consisting of three modules, namely the inter-class separation enhancer, the intra-class compactness enhancer, and the clustering module. Specifically, to induct the latent class information by utilizing the sample pairs with data similarities, the inter-class separation enhancer and the intra-class compactness enhancer handle negative and positive sample pairs, respectively, with distinct hierarchical similarities.
Additionally, the clustering module aims to ensure the alignment of cluster assignments between samples and their neighboring samples. By designing these three modules that work collaboratively, inter-class conflicts are alleviated, allowing CHDC to learn more discriminative features. Lastly, we design a novel update method for positive sample pairs to reduce the likelihood of introducing erroneous information. To evaluate the performance of CHDC, we conduct extensive experiments on five widely adopted image classification datasets. The experimental results demonstrate the superiority of CHDC compared to state-of-the-art methods. Moreover, ablation studies demonstrate the effectiveness of the proposed modules.
A Framework of Large-Scale Peer-to-Peer Learning System
ABSTRACT. Federated learning (FL) is a distributed machine learning paradigm in which numerous clients train a model dispatched by a central server while retaining the training data locally. Nonetheless, the failure of the central server can disrupt the training framework. Peer-to-peer approaches enhance the robustness of system as all clients directly interact with other clients without a server. However, a downside of these peer-to-peer approaches is their low efficiency. Communication among a large number of clients is significantly costly, and the synchronous learning framework becomes unworkable in the presence of stragglers. In this paper, we propose a semi-asynchronous peer-to-peer learning system (P2PLSys) suitable for large-scale clients. This system features a server that manages all clients but does not participate in model aggregation. The server distributes a partial client list to selected clients that have completed local training for local model aggregation. Subsequently, clients adjust their own models based on staleness and communicate through a secure multi-party computation protocol for secure aggregation. Through our experiments, we demonstrate the effectiveness of P2PLSys for image classification problems, achieving a similar performance level to classical FL algorithms and centralized training.
Efficient Chinese Relation Extraction with Multi-entity Dependency Tree Pruning and Path-Fusion
ABSTRACT. Relation Extraction (RE) is a crucial task in natural language processing that aims to predict the relationship between two given entities. In recent years, a large majority of approaches utilized syntactic information, particularly dependency trees, to enhance relation extraction by providing superior semantic guidance. Compared with other fields, Chinese texts are more semantic complex, and contain multiple pairs of entities. However, many studies only focus on removing extraneous information from the dependency tree that pertains to a single entity pair. We hypothesis that preserving the semantic and strucural interaction between multiple entity pairs in the tree is more conducive to the identification of the current entity pair relationship. Therefore, we propose a new pruning strategy called Multi-entity dependency Tree Pruning and path-Fusion (MTPF), which preserves the ancestor nodes of each entity pair to their lowest common ancestor, as well as the shortest path from that node to each entity. Then we introduce A-GCN as the encoder for the syntax tree obtained above, and the idea of multi-classification sequence as the decoder. Experimental results on two Chinese benchmark datasets, the financial dataset constructed by ourselves and DUIE1.0, demonstrate the effectiveness of our pruning strategy for CRE, where our approach outperforms strong dependency-tree baselines and achieve state-of-the-art results on both datasets
Multi-Task Learning Network for Automatic Pancreatic Tumor Segmentation and Classification with Inter-Network Channel Feature Fusion
ABSTRACT. Pancreatic cancer is a malignant tumor with a high mortality rate. Therefore, accurately identifying pancreatic cancer is of great significance for early diagnosis and treatment. Currently, several methods have been developed using network structures based on multi-task learning to address tumor recognition issues. One common approach is to use the encoding part of a segmentation network as shared features for both segmentation and classification tasks. However, due to the focus on detailed features in segmentation tasks and the requirement for more global features in classification tasks, the shared features may not provide more discriminatory feature representation for the classification task. To address above challenges, we propose a novel multi-task learning network that leverages the correlation between the segmentation and classification networks to enhance the performance of both tasks. Specifically, the classification task takes the tumor region images extracted from the segmentation network's output as input, effectively capturing the shape and internal texture features of the tumor. Additionally, a feature fusion module is added between the networks to facilitate information exchange and fusion. We evaluated our model on 82 clinical CT image samples. Experimental results demonstrate that our proposed multi-task network achieves excellent performance with a Dice similarity coefficient (DSC) of 88.42% and a classification accuracy of 85.71%.
Optimizing 3D UAV Path Planning: An Approach Incorporating Multiple Mechanisms and Beluga Whale Optimizer
ABSTRACT. The goal of 3D UAV path planning problem is to assist the UAV in planning a flight path with the lowest total overhead cost. In this paper, we present a novel approach to address the problem by incorporating flight distance, threat cost, flight altitude and path smoothness constraints into a comprehensive cost function. The current popular metaheuristic algorithm is utilized to solve for the closest globally optimal UAV flight path. To overcome the challenges of local optima and slow convergence associated with the conventional Beluga Whale Optimizer (BWO), this paper proposes a modified beluga whale optimizer (OGGBWO) based on random opposition-based learning strategy, adaptive Gauss variational operator and elitist group genetic strategy. Extensive experiments conducted on the CEC2022 test set and four distinct terrain scenarios of varying complexity demonstrate that the OGGBWO algorithm outperforms classical and state-of-the-art metaheuristics. It achieves superior optimization performance across all 12 CEC2022 test functions and exhibits exceptional convergence in generating flight paths with the lowest total cost function in diverse terrain scenarios.
A lightweight text classification model based on Label Embedding Attentive mechanism
ABSTRACT. This paper presents a lightweight model based on the self-attention mechanism for text classification tasks. In our model, we incorporate auxiliary information of the label through the label embedding method, enabling the model to capture the contextual language variations of the same word. Furthermore, we address the issue of misclassification of similar texts by introducing the contrastive loss function, in conjunction with the traditional cross-entropy loss function. Experimental evaluations are conducted on multiple datasets, comparing our model against others with similar parameter scales, thus demonstrating the effectiveness of the proposed approach.
Reinforcement Learning-Based Consensus-Reaching in Large-Scale Social Networks
ABSTRACT. Social networks in present-day industrial environments encompass a wide range of personal information that has significant research and application potential. One notable challenge in the domain of opinion dynamics of social networks is achieving convergence of opinions to a limited number of clusters. In this context, designing the communication topology of the social network in a distributed manner is particularly difficult. To address this problem, this paper proposes a novel perception model for agents. The proposed model, which is based on bidirectional recurrent neural networks, can adaptively reweight the influence of perceived neighbors in the convergence process of opinion dynamics. Additionally, effective differential reward functions are designed to optimize three objectives: convergence degree, connectivity, and cost of convergence. Lastly, a multi-agent exploration and exploitation algorithm based on policy gradient is designed to optimize the model. Based on the reward values in inter-agent interaction processes, the agents can adaptively learn the neighbor reweighting strategy with multi-objective trade-off abilities. Extensive simulations demonstrate that the proposed method can effectively reconcile conflicting opinions among agents and accelerate convergence.
PatchFinger: A Model Fingerprinting Scheme based on Adversarial Patch
ABSTRACT. As deep neural networks(DNNs) gain great popularity and importance, protecting their intellectual property is always the topic. Previous model watermarking schemes based on backdoors require explicit embedding of the backdoor, which changes the structure and parameters. Model fingerprinting based on adversarial examples does not require any modification of the model, but is limited by the characteristics of the original task and not versatile enough. We find that adversarial patch can be regarded as an inherent backdoor and can achieve the output of specific categories injected. Inspired by this, we propose PatchFinger, a model fingerprinting scheme based on adversarial patch which is applied to the original samples as a model fingerprinting through a specific fusion method. As a model fingerprinting scheme, PatchFinger does not sacrifice the accuracy of the source model, and the characteristics of the adversarial patch make it more flexible and highly robust. Experimental results show that PatchFinger achieves an ARUC value of 0.936 in a series of tests on the Tiny-ImageNet dataset, which exceeds the baseline by 19%. When considering average query accuracy, PatchFinger gets 97.04% outperforming the method tested.
Attribution of Adversarial Attacks via Multi-Task Learning
ABSTRACT. Deep neural networks (DNNs) can be easily fooled by adversarial examples during inference phase when attackers add imperceptible perturbations to original examples. Many works focus on adversarial detection and adversarial training to defend against adversarial attacks. However, few works explore the tool-chains behind adversarial examples, which is called Adversarial Attribution Problem (AAP). In this paper, AAP is defined as the recognition of three signatures, i.e., attack algorithm, victim model and hyperparameter. Existing works transfer AAP into a single-label classification task and ignore the relationship between above three signatures. Actually, there exists owner-member relationship between attack algorithm and hyperparameter, which means hyperparameter recognition relies on the result of attack algorithm classification. Besides, the value of hyperparameter is continuous, hence hyperparameter recognition should be regarded as a regression task. As a result, AAP should be considered as a multi-task learning problem rather than a single-label classification problem or a single-task learning problem. To deal with above problems, we propose a multi-task learning framework named Multi-Task Adversarial Attribution (MTAA) to recognize the three signatures simultaneously. It takes the relationship between attack algorithm and the corresponding hyperparameter into account and uses the uncertainty weighted loss to adjust the weights of three recognition tasks. The experimental results on MNIST and ImageNet show the feasibility and scalability of the proposed framework.
Improve Conversational Search with Multi-Document Information
ABSTRACT. Conversational Search (CS) aims to satisfy complex information needs via multi-turn user-agent interactions. During this process, multiple documents need to be retrieved based on the conversation history to respond user. However, the conversation history may contain documents with irrelevant information, which would negatively affect the retrieval performance. Since the information in documents is not only implicit in the conversation history but also present in the context of document passages, current approaches still make it difficult for the model to distinguish information irrelevant to the user's current question from the semantic-level. In order to enhance the model's ability to comprehend conversations and distinguish passages in irrelevant documents via multiple documents information, considering the lack of training data for extracting multiple documents information, we propose an unsupervised multiple documents conversation segmentation method and a zero-shot Large Language Model (LLM)-based document summarization method to extract multiple documents information from conversation history and documents respectively. We further propose the Passage-Segment-Document (PSD) post-training method to train the reranker using the extracted multiple documents information in combination with a multi-task learning method. On the MultiDoc2Dial dataset, our results are 1.3\% and 1.4\% higher than the SOTA on the R@1 and MRR@5 metrics, respectively, which verifies the improvement of our method on retrieval performance. Extensive experiments show the strong performance of our method for dealing with conversation histories that contain multiple documents information.
Naturalistic Emotion Recognition Using EEG and Eye Movements
ABSTRACT. Emotion recognition in affective brain-computer interfaces (aBCI) has emerged as a prominent research area. However, existing experimental paradigms for collecting emotional data often rely on stimuli-based elicitation, which may not accurately reflect emotions experienced in everyday life. Moreover, these paradigms are limited in terms of stimulus types and lack investigation into decoding naturalistic emotional states. To address these limitations, we propose a novel experimental paradigm that enables the recording of physiological signals in a more natural way. In our approach, emotions are allowed to arise spontaneously, unrestricted by specific experimental activities. Participants have the autonomy to determine the start and end of each recording session and provide corresponding emotion label. Over a period of three months, we recruited six subjects and collected data through multiple recording sessions per subject. We utilized electroencephalogram (EEG) and eye movement signals in both subject-dependent and cross-subject settings. In the subject-dependent unimodal condition, our attentive simple graph convolutional network (ASGC) achieved the highest accuracy of 76.32% for emotion recognition based on EEG data. For the cross-subject unimodal condition, our domain adversarial neural network (DANN) outperformed other models, achieving an average accuracy of 71.90% based on EEG data. These experimental results demonstrate the feasibility of recognizing emotions in naturalistic settings. The proposed experimental paradigm holds significant potential for advancing emotion recognition in various practical applications. By allowing emotions to unfold naturally, our approach enables the future emergence of more robust and applicable emotion recognition models in the field of aBCI.
Preserving Potential Neighbors for Low-Degree Nodes via Reweighting in Link Prediction
ABSTRACT. Link prediction is an important task for graph data. Methods based on graph neural networks achieve high accuracy by training on the observed graph. However, these methods often get worse performance for low-degree nodes. After theoretical analysis, we find that current link prediction models focus more on negative samples for nodes with lower degrees, which makes model hard to find potential neighbors of these nodes when inferring. In order to improve the performance on low-degree nodes, we first design a node-wise score to quantify how seriously the training is biased to negative samples. Based on the score, we develop a reweighting method called harmonic weighting(HAW) to help the model reserves potential neighbors for low-degree nodes. Experimental results show that model combined with HAW gets better performance on most datasets. By comparing the performance on nodes of different degree and visualizing node embedding, we find that HAW can help models with training low-degree nodes while keep the performance of other nodes.
Unconstrained Feature Model and Its General Geometric Patterns in Federated Learning: Local Subspace Minority Collapse
ABSTRACT. Federated Learning is a decentralized approach to machine learning that enables multiple parties to collaborate in building a shared model without centralizing data, but it can face issues related to client drift and the heterogeneity of data. However, there is a noticeable absence of thorough analysis regarding the characteristics of client drift and data heterogeneity in FL within existing studies. In this paper, we reformulate FL using client-class sampling as an unconstrained feature model (UFM), and validates the soundness of UFM in FL through theoretical proofs and experiments. Based on the model, we explored the potential information loss, the source of client drifting, and general geometric patterns in FL, called \textit{local subspace minority collapse}. Through theoretical deduction and experimental verification, we provide support for the soundness of UFM and observe its predicted phenomenon, neural collapse.
Dual Knowledge Distillation for Neural Machine Translation
ABSTRACT. Existing knowledge distillation methods use large amount of bilingual data and focus on mining the corresponding knowledge distribution between the source language and the target language. However, for some languages, bilingual data is not abundant. In this paper, to make better use of both monolingual and limited bilingual data, we propose a new knowledge distillation method called Dual Knowledge Distillation (DKD). For monolingual data, we use a self-distillation strategy which combines self-training and knowledge distillation for the encoder to extract more consistent monolingual representation. For bilingual data, on top of the k Nearest Neighbor Knowledge Distillation (kNN-KD) method, a similar self-distillation strategy is adopted as a consistency regularization method to force the decoder to produce consistent output. Experiments on standard datasets, multi-domain translation datasets, and low-resource datasets show that DKD achieves consistent improvements over state-of-the-art baselines including kNN-KD.
Latent Causal Dynamics Model for Model-based Reinforcement Learning
ABSTRACT. Learning an accurate dynamics model is the key task for model-based reinforcement learning (MBRL). Most existing MBRL methods learn the dynamics model over states. But in most cases, the relationships among states are complex because the states are affected by the interaction of various factors in the environment. Recently some works are proposed to learn the dynamics model on latent representations space. But the learned model is dense and may contain spurious associations between latent representations. To deal with these problems, we introduce a latent causal dynamics model over latent representations and provide a learning method for MBRL. Specifically, we first learn the latent representations from the observed state space. Second, we learn a latent causal dynamics model among latent representations by a causal discovery method. Finally, the latent causal dynamics model is used to aid policy learning. The above steps are iterative to update the unified loss function until convergence. Experimental results on four tasks show that the performance of our proposed method benefits from the causality and the learned latent representations.
SODet: A LiDAR-based Object Detector in Bird's-Eye View
ABSTRACT. LiDAR-based object detection is of paramount significance in the realm of autonomous driving applications. Nevertheless, the detection of small objects from a bird's-eye view perspective remains challenging. To address this issue, the paper presents SODet, an efficient single-stage 3D object detector designed to enhance the perception of small objects like pedestrians and cyclists. SODet incorporates several key components and techniques. To capture broader context information and augment the capability of feature representation, the model constructs residual blocks comprising large-kernel depthwise convolutions and inverted bottleneck structures, forming the foundation of the CSP-based NeXtDark backbone network. Furthermore, the NeXtFPN feature extraction network is designed with the introduced SPPF module and the proposed special residual blocks, enabling the extraction and fusion of multi-scale information. Additionally, training strategies such as mosaic data augmentation and cosine annealing learning rate are employed to further improve small object detection accuracy. The effectiveness of SODet is demonstrated through experimental results on the KITTI dataset, showcasing a remarkable enhancement in detecting small objects from a bird's-eye perspective while maintaining a detection speed of 20.6 FPS.
MVFAN: Multi-View Feature Assisted Network for 4D Radar Object Detection
ABSTRACT. 4D radar is recognized for its resilience and cost-effectiveness under adverse weather conditions, thus playing a pivotal role in autonomous driving. While cameras and LiDAR are typically the primary sensors used in perception modules for autonomous vehicles, radar serves as a valuable supplementary sensor. Unlike LiDAR and cameras, radar remains unimpaired by harsh weather conditions, thereby offering a dependable alternative in challenging environments. Developing radar-based 3D object detection not only augments the competency of autonomous vehicles but also provides economic benefits. In response, we propose the Multi-View Feature Assisted Network (MVFAN), an end-to-end, anchor-free, and single-stage framework for 4D-radar-based 3D object detection for autonomous vehicles. We tackle the issue of insufficient feature utilization by introducing a novel Position Map Generation module to enhance feature learning by reweighing foreground and background points, and their features, considering the irregular distribution of radar point clouds. Additionally, we propose a pioneering backbone, the Radar Feature Assisted backbone, explicitly crafted to fully exploit the valuable Doppler velocity and reflectivity data provided by the 4D radar sensor. Comprehensive experiments and ablation studies carried out on Astyx and VoD datasets attest to the efficacy of our framework. The incorporation of Doppler velocity and RCS reflectivity dramatically improves the detection performance for small moving objects such as pedestrians and cyclists. Consequently, our approach culminates in a highly optimized 4D-radar-based 3D object detection capability for autonomous driving systems, setting a new standard in the field.
Mitigation of Voltage Violation for Battery Charging Based on Data-Driven Optimization
ABSTRACT. Fast-charging of lithium-ion batteries is a key technology to reduce charging time and enhance user convenience.
However, as the power of the battery increases, the battery voltage rises rapidly during fast charging.
Once the battery voltage increases above the demand limited voltage, the battery will be irreversibly damaged.
Naturally, the fast-charging process is a multi-objective optimization problem, where charging time and battery voltage violation are two conflicting objectives.
In this work, we propose a multi-objective reinforcement learning algorithm for finding fast-charging strategies considering the trade-off between charging time and battery voltage violation.
Firstly, the lithium-ion battery Doyle-Fuller-Newman (DFN) model is established, and the fast-charge process is built based on the DFN model.
Secondly, the fast-charge process is formulated as a Markov decision process (MDP).
Thirdly, the multi-objective reinforcement learning algorithm is proposed to solve the MDP problem.
Finally, the proposed algorithm is verified by simulation experiments.
The simulation results show that the proposed algorithm can effectively produce effective fast-charging strategies for lithium-ion battery according to the given preference.
Structural Properties of Associative Knowledge Graphs
ABSTRACT. This paper introduces a novel structural approach to constructing associative knowledge graphs. These graphs are composed of many overlapping scenes, with each scene representing a specific set of objects. The knowledge graph nodes represent various objects present within the scenes. In the knowledge graph, each scene is represented as a complete subgraph. It is important to note that the same object can appear in multiple scenes. The recreation of the stored scenes from the knowledge graph occurs through association with a given context, which includes some of the objects stored in the graph. The memory capacity of the system is determined by the size of the graph and the density of its synaptic connections. Theoretical dependencies are derived to describe both the critical graph density and the memory capacity of scenes stored in such graphs. The critical graph density represents the maximum density at which it is possible to reproduce all elements of the scene without errors.
Generalized Category Discovery with Clustering Assignment Consistency
ABSTRACT. Generalized category discovery (GCD) is a recently proposed open-world task. Given a set of images consisting of labeled and unlabeled instances, the goal of GCD is to automatically cluster the unlabeled samples using information transferred from the labeled dataset. The unlabeled dataset comprises both known and novel classes. The main challenge is that unlabeled novel class samples and unlabeled known class samples are mixed together in the unlabeled dataset. To address the GCD without knowing the class number of unlabeled dataset, we propose a co-training-based framework that encourages clustering consistency. Specifically, we first introduce weak and strong augmentation transformations to generate two sufficiently different views for the same sample. Then, based on the co-training assumption, we propose a consistency representation learning strategy, which encourages consistency between feature-prototype similarity and clustering assignment. Finally, we use the discriminative embeddings learned from the semi-supervised representation learning process to construct an original sparse network and use a community detection method to obtain the clustering results and the number of categories simultaneously. Extensive experiments show that our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets. Especially in the ImageNet-100 data set, our method significantly exceeds the best baseline by 15.5% and 7.0% on the Novel and All classes, respectively.
Few-shot Anomaly Detection in Text with Deviation Learning
ABSTRACT. Most current methods for detecting anomalies in text concentrate on constructing models solely relying on unlabeled data. These models operate on the presumption that no labeled anomalous examples are available, which prevents them from utilizing prior knowledge of anomalies that are typically present in small numbers in many real-world applications. Furthermore, these models prioritize learning feature embeddings rather than optimizing anomaly scores directly, which could lead to suboptimal anomaly scoring and inefficient use of data during the learning process. In this paper, we introduce FATE, a deep few-shot learning-based framework that leverages limited anomaly examples and learns anomaly scores explicitly in an end-to-end method using deviation learning. In this approach, the anomaly scores of normal examples are adjusted to closely resemble reference scores obtained from a prior distribution. Conversely, anomaly samples are forced to have anomalous scores that considerably deviate from the reference score in the upper tail of the prior. Additionally, our model is optimized to learn the distinct behavior of anomalies by utilizing a multi-head self-attention layer and multiple instance learning approaches. Comprehensive experiments on several benchmark datasets demonstrate that our proposed approach attains a new level of state-of-the-art performance
Introducing Semantic-based Receptive Field into Semantic Segmentation via Graph Neural Networks
ABSTRACT. Current semantic segmentation models typically use deep learning models as encoders. However, these models have a fixed receptive field, which can cause mixed information within the receptive field and lead to confounding effects during neural network training. To address these limitations, we propose the "semantic-based receptive field" based on our analysis in current models. This approach seeks to improve the segmentation performance by aggregate image patches with similar representation rather than their physical location, aiming to enhance the interpretability and accuracy of semantic segmentation models.
For implementation, we utilize Graph representation learning (GRL) approaches into current semantic segmentation models. Specifically, we divide the input image into patches and construct them into graph-structured data that expresses semantic similarity. Our Graph Convolution Receptor block uses graph-structured data purpose-built from image data and adopt a node-classification-like perspective to address the problem of semantic segmentation. Our GCR module models the relationship between semantic relative patches, allowing us to mitigate the adverse effects of confounding information and improve the quality of feature representation. By adopting this approach, we aim to enhance the accuracy and robustness of the semantic segmentation task.
Finally, we evaluated our proposed module on multiple semantic segmentation models and compared its performance to baseline models on multiple semantic segmentation datasets. Our empirical evaluations demonstrate the effectiveness and robustness of our proposed module, as it consistently outperformed baseline models on these datasets.
A compliant elbow exoskeleton with an SEA at interaction port
ABSTRACT. In recent years, various series elastic actuators (SEAs) have been proposed to enhance the flexibility and safety of wearable exoskeletons. This paper proposes an SEA composed of wave springs and installs it at human-robot interaction port. Considering the hysteresis nonlinear characteristics of the SEA, displacement-force models of the SEA are established based on long short-term memory (LSTM) model and T-S fuzzy model in a nonlinear auto-regression moving average with exogenous input (NARMAX) structure. Based on the established models, the SEA can effectively serve as an interaction force sensor. Subsequently, the SEA is integrated into an elbow exoskeleton, and a compliant admittance controller is designed based on the displacement-force model. Experimental results demonstrate that the proposed approach effectively enhances the flexibility of human-robot interaction.
Theoretical Analysis of Gradient-Zhang Neural Network for Time-Varying Equations and Improved Method for Linear Equations
ABSTRACT. Solving time-varying equations is fundamental in science and engineering. This paper aims to find a fast-converging and high-precision method for solving time-varying equations. We combine two classes of feedback neural networks, i.e., gradient neural network (GNN) and Zhang neural network (ZNN), to construct a continuous gradient-Zhang neural network (GZNN) model. Our research shows that GZNN has the advantages of high convergence precision of ZNN and fast convergence speed of GNN in certain cases, i.e., all the eigenvalues of Jacobian matrix of the time-varying equations multiplied by its transpose are larger than 1. Furthermore, we conduct the mathematical proof and theoretical analysis to establish the stability and convergence of the GZNN model. Additionally, we discretize the GZNN model by utilizing time discretization formulas (i.e., Euler and Taylor-Zhang discretization formulas), to construct corresponding discrete GZNN algorithms for solving discrete time-varying problems. Different discretization formulas can construct discrete algorithms with varying precision. As the time sampling instants increase, the precision of discrete algorithms can be further improved. Furthermore, we optimize the matrix inverse operation in the GZNN model and develop inverse-free GZNN algorithms to solve linear problems, effectively reducing their time complexity. Finally, numerical experiments are conducted to validate the feasibility of GZNN model and the corresponding discrete algorithms in solving time-varying equations, as well as the efficiency of the inverse-free method in solving linear equations.
Distributed Nash Equilibrium Seeking of Noncooperative Games with Communication Constraints and Matrix Weights
ABSTRACT. Distributed Nash equilibrium seeking is investigated in this paper for a class of multi-agent systems under intermittent communication and matrix-weighted communication graphs. Different from most of the existing works on distributed Nash equilibrium seeking of noncooperative games where the players (agents) can communicate continuously over time, the players considered in this paper are assumed to exchange information only with their neighbors during some disconnected time intervals while the underlying communication graph is matrix-weighted. A distributed Nash equilibrium seeking algorithm integrating gradient strategy and leader-following consensus protocol is proposed for the noncooperative games with intermittent communication and matrix-weighted communication graphs. The effect of the average intermittent communication rate on the convergence of the distributed Nash equilibrium seeking algorithm is analyzed, and a lower bound of the average intermittent communication rate that ensures the convergence of the algorithm is given. The convergence of the algorithm is established by means of Lyapunov stability theory. Simulations are presented to verify the proposed distributed Nash equilibrium seeking algorithm.
Adaptive load frequency control and optimization based on TD3 algorithm and linear active disturbance rejection control
ABSTRACT. This paper presents the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm optimized Linear Active Disturbance Rejection Control (LADRC) approach to tackle the problem of frequency deviation resulting from load disturbance and Renewable Energy Sources (RESs) in interconnected power systems. The LADRC approach employs a Linear Extended State Observer (LESO) to estimate the disturbance information in each area and utilizes a Proportional-Derivative (PD) controller to eliminate the disturbance. Simultaneously, the TD3 algorithm is trained in to acquire the adaptive controller parameters. In order to improve the convergence of the TD3 algorithm, a Lyapunov-reward shaping function is adopted. Finally, the the proposed method is applied to two-area interconnected power system, comprising thermal, hydro, and gas power plants in each area, as well as RESs such as a noise-based wind turbine and photovoltaic (PV) system. The simulation results indicate that the proposed method is a highly effective approach for load frequency control.
Integrated Design of Fully Distributed Adaptive State Estimation and Consensus Control for Multi-Agent Systems
ABSTRACT. In this paper, the problem of fully distributed adaptive state estimation and consensus control for linear multi-agent systems is investigated. By designing fully distributed adaptive output tracking observers, the entire output information is available to each agent, and local state estimators based on the estimated output is constructed to estimate the overall state of multi-agent systems. The consensus control protocol based on state estimation for multi-agent systems is designed to ensure the agents achieve consensus. The proposed control input of each agent relies on its own estimation of the entire state. Theoretical analysis proves the effectiveness of the algorithm and practical applications are given by simulation.
High-order control barrier function based robust collision avoidance formation tracking of constrained multi-agent systems
ABSTRACT. In this work, we propose a high-order control barrier functions (HOCBFs) based safe formation tracking controller for second-order multi-agent systems subject to input uncertainties and both velocity and input constraints (VICs). First, a nominal velocity and input constrained formation tracking controller is proposed which using sliding mode control theory to eliminate the effects of the uncertain dynamics. Then, the HOCBFs-based collision avoidance conditions are derived for the followers where both collision among the agents and between the agents and the obstacles are considered. Finally, the collision avoidance formation tracking controller for the constrained uncertain second-order multi-agent systems is constructed by formulating a local quadratic programming (QP) problem for each follower. It is shown that under proper initial conditions, there always exist feasible control inputs such that collision avoidance can be guaranteed under both VICs of the agents. Simulation examples illustrate the effectiveness of the proposed control strategy.
Distributed Neurodynamic Approach for Optimal Allocation with Separable Resource Losses
ABSTRACT. To solve the optimal allocation problem with separable resource losses, this paper proposes a neurodynamic method based on multi-agent system. By using KKT condition, the nonlinear coupling equality constraint in the original problem is equivalently transformed into a convex coupling inequality constraint. Then, with the help of finite-time tracking technology and fixed-time projection method, a neurodynamic method is designed and its convergence is strictly proved. Finally, the simulation results verify the effectiveness of the proposed neurodynamic method.
Policy Representation Opponent Shaping via Contrastive Learning
ABSTRACT. To acquire results with higher social welfare in social dilemmas, agents need to maintain cooperation. Independent agents manage to navigate social dilemmas via opponent shaping. However, opponent shaping needs extra opponent information. It is not always accessible in mixed tasks if agents are decentralized. To address this, We present PROS, which runs in a fully-independent setting and needs no extra information. PROS shapes the opponent with an extended policy that takes the opponent’s dynamics as additional input. Instead of receiving policy from the opponent, we discriminate the policy representation via contrastive learning. In terms of experiments, PROS reaches the optimal Nash equilibrium in iterated prisoners’ dilemma (IPD) and shows the same ability to maintain cooperation in Coin Game, a highly-dimensional version of IPD.
Distributed State Estimation for Multi-Agent Systems Under Consensus Control
ABSTRACT. Distributed state estimation and consensus control for linear time-invariant multi-agent systems under strongly connected directed graph are addressed in this paper. The distributed output tracking algorithm and the local state estimator are designed for each agent to estimate the output and state of the entire multi-agent system, despite having access only to local output measurements that are insufficient to directly reconstruct the entire state. The consensus control protocol is further designed based on each agent's own entire state estimation. Neither distributed state estimation nor consensus control protocol design requires state information from neighboring agents, eliminating the transmission of the values of state estimations during the whole process. The theoretical analysis demonstrates that the realization of distributed output tracking and state estimation. Moreover, all agents achieve consensus. Finally, numerical simulations are worked out to show the effectiveness of the proposed algorithm.
Theory-guided Convolutional Neural Network with an Enhanced Water Flow Optimizer
ABSTRACT. Theory-guided neural network recently has been used to solve partial differential equations. This method has received widespread attention due to its low data requirements and adherence to physical laws during the training process. However, the selection of the punishment coefficient for including physical laws as a penalty term in the loss function undoubtedly affects the performance of the model. In this paper, we propose a comprehensive theory-guided framework using a bilevel programming model that can adaptively adjust the hyperparameters of the loss function to further enhance the performance of the model. An enhanced water flow optimizer (EWFO) algorithm is applied to optimize upper-level variables in the framework. In this algorithm, an opposition-based learning technic is used in the initialization phase to boost the
initial group quality: a nonlinear convergence factor is added to the laminar flow operator to upgrade the diversity of the group and expand the search range. The experiments show that competitive performance of the method in solving stochastic partial differential equations.
Ensemble of randomized neural network and boosted trees for eye-tracking-based driver situation awareness recognition and interpretation
ABSTRACT. Ensuring traffic safety is crucial in the pursuit of sustainable transportation. Across diverse traffic systems, maintaining good situation awareness (SA) is important in promoting and upholding traffic safety. This work focuses on a regression problem of using eye-tracking features to perform situation awareness (SA) recognition in the context of conditionally automated driving. As a type of tabular dataset, recent advances have shown that both neural networks (NNs) and gradient-boosted decision trees (GBDTs) are potential solutions to achieve better performance. To avoid the complex analysis to select the suitable model for the task, this work proposed to combine the NNs and tree-based models to achieve better performance on the task of SA assessment generally. Considering the necessity of the real-time measure for practical applications, the ensemble deep random vector functional link (edRVFL) and light gradient boosting machine (lightGBM) were used as the representative models of NNs and GBDTs in the investigation, respectively. Furthermore, this work exploited Shapley additive explanations (SHAP) to interpret the contributions of the input features, upon which we further developed two ensemble modes. Experimental results demonstrated that the proposed model outperformed the baseline models, highlighting its effectiveness. In addition, the interpretation results can also provide practitioners with references regarding the eye-tracking features that are more relevant to SA recognition.
Communication-Efficient Distributed Minimax Optimization via Markov Compression
ABSTRACT. Recently, the minimax problem has attracted a lot of attention due to its wide applications in modern machine learning fields such as GANs. With the exponential growth of data volumes and increasing problem sizes, the design of distributed algorithms to train high-performance models has become imperative. However, distributed algorithms often suffer from communication bottlenecks. To address this
challenge, in this paper, we propose a communication-efficient distributed compressed stochastic gradient descent ascent algorithm, abbreviated as DCSGDA, in a parameter-server setting. To reduce the communication cost, each client in DCSGDA transmits the compressed gradients of the primal and dual variables to the server at each iteration. In particular, we leverage a Markov compression mechanism that allows both unbiased and biased compressors to mitigate the negative effect of compression errors on convergence. Namely, we show theoretically that the DCSGDA
algorithm can still achieve linear convergence in the presence of compression errors, provided that the local objective function is strongly-convex-strongly-concave. Finally, numerical experiments demonstrate the desirable communication efficiency and efficacy of the proposed DCSGDA.
Cascaded fuzzy PID control for quadrotor UAVs based on RBF neural networks
ABSTRACT. Since quadrotor UAVs often need to fly in complex and changing environments, their systems suffer from slow smooth control response, weak self-turbulence capability, and poor self-adaptability. Thus, it is crucially important to carefully formulate a quadrotor UAV control system that can maintain high-precision control and high immunity to disturbance in complex environments. In this paper, an improved nonlinear cascaded fuzzy PID control approach for quadrotor UAVs based on RBF neural network is proposed. Based on the analysis and establishment of the UAV flight control model, this paper designs a control approach with an outer-loop fuzzy adaptive PID control and an inner-loop RBF neural network. The simulation results show that introducing RBF neural networks into the nonlinear fuzzy adaptive PID control can make it have better high-precision control and high anti-disturbance under the influence of different environmental variables.
A Deep Graph Matching-Based Method for Trajectory Association in Vessel Traffic Surveillance
ABSTRACT. Vessel traffic surveillance in inland waterways extensively relies on the Automatic Identification System (AIS) and video cameras. While video data only captures the visual appearance of vessels, AIS data serves as a valuable source of vessel identity and motion information, such as position, speed, and heading. To gain a comprehensive understanding of the behavior and motion of known-identity vessels, it is necessary to fuse the AIS-based and video-based trajectories. An important step in this fusion is to obtain the correspondence between moving targets by trajectory association. Thus, we focus solely on trajectory association in this work and propose a trajectory association method based on deep graph matching. We formulate trajectory association as a graph matching problem and introduce an attention-based flexible context aggregation mechanism to exploit the semantic features of trajectories. Compared to traditional methods that rely on manually designed features, our approach captures complex patterns and correlations within trajectories through end-to-end training. The introduced dustbin mechanism can effectively handle outliers during matching. Experimental results on synthetic and real-world datasets demonstrate the exceptional performance of our method in terms of trajectory association accuracy and robustness.
Feature Reconstruction Distillation with Self-attention
ABSTRACT. A recently proposed knowledge distillation method based on feature map transfer verifies that the intermediate layers of the teacher model can be used as effective targets for training the student model for better generalization. Existing research mainly focuses on how to efficiently transfer knowledge between the intermediate layers of teacher and student models. However, they ignore the increase in the number of channel in the middle layer, which will introduce redundant feature information, and there is also a lack of interaction between shallow features and deep features. To alleviate these two problems, we propose a new knowledge distillation method called Feature Reconstruction Knowledge Distillation (FRD). By reconstructing the features of the middle layer of the teacher model, the student model can learn more accurate feature information. In addition, to alleviate the problem that feature maps have only local information, we use a self-attention mechanism to fuse shallow and deep features, thereby enhancing the model's ability to handle global details. Through extensive experiments on different network architectures involving various teacher and student models, we observe that the proposed method significantly improves performance.
ABSTRACT. Text-to-image generation (T2I) has been a popular research field in recent years, and its goal is to generate corresponding photorealistic images through natural language text descriptions. Existing T2I models are mostly based on generative adversarial networks, but it is still very challenging to guarantee the semantic consistency between a given textual description and generated natural images. To address this problem, we propose a concise and practical novel framework, Conformer-GAN. Specifically, we propose the Conformer block, consisting of the Convolutional Neural Network(CNN) and Transformer branches. The CNN branch is used to generate images conditionally from noise. The Transformer branch continuously focuses on the relevant words in natural language descriptions and fuses the sentence and word information to guide the CNN branch for image generation. Our approach can better merge global and local representations to improve the semantic consistency between textual information and synthetic images. Importantly, our Conformer-GAN can generate natural and realistic 512 * 512 images. Extensive experiments on the challenging public benchmark datasets CUB bird and COCO demonstrate that our method outperforms recent state-of-the-art methods both in terms of generated image quality and text-image semantic consistency.
Isomorphic Dual-Branch Network for Non-homogeneous Image Dehazing and Super-Resolution
ABSTRACT. Removing non-homogeneous haze from real-world images is a challenging task. Meanwhile, the popularity of high-definition imaging systems and compute-limited smart mobile devices has resulted in new problems, such as the high computational load caused by haze removal for large-size images, or the severe information loss caused by the degradation of both the haze and image downsampling, when applying existing dehazing methods. To address these issues, we propose an isomorphic dual-branch dehazing and super resolution network for non-homogeneous dehazing of a downsampled hazy image, which produces dehazed and enlarged images with sharp edges and high color fidelity. We quantitatively and qualitatively compare our network with several state-of-the-art dehazing methods under the condition of different downsampling scales. Extensive experimental results demonstrate that our method achieves superior performance in terms of both the quality of output images and the computational load. The code is publicly available at https://github.com/xxx/(will be public after publication).
ABSTRACT. Novel view synthesis (NVS) aims to synthesize photo-realistic images depicting a scene by utilizing existing source images. The core objective is that the synthesized images are supposed to be as close as possible to the scene content. In recent years, various approaches shift the focus towards the visual effect of images in continuous space or time. While current methods for static scenes treat the rendering of images as isolated processes, neglecting the geometric consistency in static scenes. This usually results in incoherent visual experiences like flicker or artifacts in synthesized image sequences. To address this limitation, we propose Multi-View Consistency View Synthesis (MCVS). MCVS leverages long short-term memory (LSTM) and self-attention mechanism to model the spatial correlation between synthesized images, hence forcing them closer to the ground truth. MCVS not only enhances multi-view consistency but also improves the overall quality of the synthesized images. The proposed method is evaluated on the Tanks and Temples dataset, and the FVS dataset. On average, the Learned Perceptual Image Patch Similarity (LPIPS) is better than state-of-the-art approaches by 0.14 to 0.16%, indicating the superiority of our approach.
YOLO-D: Dual-branch infrared distant target detection based on multi-level weighted feature fusion
ABSTRACT. Infrared distant target detection is crucial in border patrol, traffic management, and maritime search and rescue operations due to its adaptability to environmental factors. However, in order to implement infrared distant target detection for aerial patrols using Unmanned Aerial Vehicles (UAVs), the challenges such as low signal-to-clutter ratio (SCR), limited contrast, and small imaging area have to be addressed. To this end, the paper presents a dual-branch infrared distant target detection model. To be specific, the model incorporates a contour feature extraction branch to improve the network's ability in recognizing distant targets and a multi-level weighted feature fusion method that combines contour features with their original counterparts to enhance target representation. The proposed model is evaluated using the High-altitude Infrared Thermal Dataset for Unmanned Aerial Vehicles (HIT-UAV), which includes persons, cars, and bicycles as detection targets at altitudes ranging from 60 to 130 meters. Experimental results show that, in comparison with the state-of-the-art models, our model improves the Average Precision (AP) of persons, bicycles, and cars by 2%, 2.21%, and 0.39% on average, respectively, and improves the mean Average Precision (mAP) of all categories by 1.53%.
ABSTRACT. In image segmentation tasks for real-world applications, the
number of semantic categories can be very large, and the number of
objects in them can vary greatly. In this case, the multi-channel repre-
sentation of the output mask for the segmentation model is inefficient. In
this paper we explore approaches to overcome such a problem by using a
single-channel output mask and additional input information about the
desired class for segmentation. We call this information task embedding
and we learn it in the process of the neural network model training. In
our case, the number of tasks is equal to the number of segmentation
categories. This approach allows us to build universal models that can
be conveniently extended to an arbitrary number of categories without
changing the architecture of the neural network. To investigate this idea
we developed a transformer neural network segmentation model named
TASFormer. We demonstrated that the highest quality results for task-
aware segmentation are obtained using adapter technology as part of the
model. To evaluate the quality of segmentation, we introduce a binary
intersection over union (bIoU) metric, which is an adaptation of the stan-
dard mIoU for the models with a single-channel output. We analyze its
distinguishing properties and use it to compare modern neural network
methods. The experiments were carried out on the universal ADE20K
dataset. The proposed TASFormer-based approach demonstrated state-
of-the-art segmentation quality on it. The software implementation of
the TASFormer method and the bIoU metric is publicly available at
github.com/subake/TASFormer.
PAG: Protecting Artists’ Works from Personlizing Image Generative Models
ABSTRACT. Recent advances in conditional image generation have led to powerful personalized generation models that generate high-resolution artistic images based on simple text descriptions through tuning. However, the abuse of personalized generation models may also increase the risk of plagiarism and the misuse of artists’ painting styles. In this paper, we propose a novel method called Protecting Artworks from Personalizing Image Generative Models framework (PAG) to safeguard artistic images from the malicious use of generative models. By injecting learned target perturbations into the original artistic images, we aim to disrupt the tuning process and introduce the distortions that protect the authenticity and integrity of the artist’s style. Furthermore, human evaluations suggest that our PAG model offers a feasible and effective way to protect artworks, preventing the personalized generation models from generating similar images to the given artworks.
Image Blending Algorithm with Automatic Mask Generation
ABSTRACT. The field of image blending has gained popularity in recent years for its ability to create visually stunning content. However, the current image blending algorithm has the following problems: 1) The manual creation of the image blending mask requires a lot of manpower and material resources; 2) The image blending algorithm cannot effectively solve the problems of brightness distortion and low resolution. To this end, we propose a new image blending method: it combines semantic object detection and segmentation with corresponding mask generation to automatically blend images, while a two-stage iterative algorithm based on our proposed new saturation loss and PAN algorithm to fix brightness distortion and low resolution issues. Results on publicly available datasets show that our method outperforms many classic image blending algorithms on various performance metrics such as PSNR and SSIM.
Progressive Supervision for Tampering Localization in Document Images
ABSTRACT. Tampering localization in document images plays an important role in the field of forensic and security, which has made great progress in recent years, however it is far from being solved. In this work, we aim to improve the tampering localization performance by refining both sides of the localization model. On one hand, we propose a multi-view enhancement (MVE) module at the input side, which combines RGB image, noise residual and texture information to obtain more forensic traces for tampering localization. On the other hand, at the output side, we propose both progressive supervision (PS) and detection assistance (DA) modules to enrich more detailed supervision information. Under the progressive supervision, we calculate BCE loss at each scale to extensively explore multi-scale features, which are vital for the tampering localization. To explore the tampering detection model, we adopt a KL loss to align both tampering localization and detection scores in the DA module, benefiting the estimation of global tampered probability. In the experiments, we evaluate the proposed method on the benchmark dataset DocTamper, and the results demonstrate the effectiveness of the proposed method.
Privacy-preserving Image Classification and Retrieval Scheme over Encrypted Images
ABSTRACT. Image retrieval is a crucial function in several emerging computer vision applications, including online medical diagnosis and image recognition systems. With an increasing number of images being generated and stored in public clouds, there is a growing concern about privacy leaks for images outsourced. To address this issue, we propose an efficient privacy-preserving image classification and retrieval system (PICR) that employs low-dimensional vectors to represent image categories and feature vectors, thereby improving retrieval efficiency and reducing index storage costs. First, we design a feature extraction model based on convolutional neural network (CNN) to generate segmented hash codes that represent both image categories and features. Next, the cryptographic hierarchical index structure based on the category hash code is designed to improve retrieval accuracy and efficiency. Then, we employ random vectors and the Learning With Errors (LWE)-based secure k-Nearest Neighbour (kNN) algorithm to preserve the privacy of segmented hash codes and file-access patterns. Finally, we provide the security analysis that verifies the SHPIR scheme can protect image privacy as well as indexing and query privacy. Experimental evaluation demonstrates that our proposed scheme outperforms the existing state-of-the-art schemes in terms of retrieval accuracy, search efficiency and storage costs.
Multi-view Contrastive learning for Knowledge-aware Recommendation
ABSTRACT. Knowledge-aware recommendation has attracted increasing attention due to its wide application in alleviating data-sparse and cold-start, but the real-world knowledge graph(KG) contains many noises from irrelevant entities. Recently, contrastive learning, a self-supervised learning (SSL) method, has shown excellent anti-noise performance in recommendation task. However, the inconsistency between the use of noisy embeddings in SSL tasks and the original embeddings in recommendation tasks limits the model's ability.
We propose a Multi-view Contrastive learning for Knowledge-aware Recommendation framework (MCKR) to solve the above problems. To remove inconsistencies, MCKR unifies the input of SSL and recommendation tasks and learns more representations from the contrastive learning method. To alleviate the noises from irrelevant entities, MCKR preprocesses the KG triples according to the type and randomly perturbs of graph structure with different weights. Then, a novel graph convolutional network is proposed to learn more reliable entity information in KG. Extensive experiments on three popular benchmark datasets present that our approach achieves state-of-the-art. Further analysis shows that MCKR also performs well in reducing data noise.
PYGC: a PinYin Language Model Guided Correction Model for Chinese Spell Checking
ABSTRACT. Chinese Spell Checking (CSC) is an NLP task that detects and corrects erroneous characters in Chinese texts. Since people often use pinyin (pronunciation of Chinese characters) input methods or speech recognition to type text, most of these errors are misuse of phonetically or semantically similar characters. Previous attempts fuse pinyin information into the embedding layer of pre-trained language models. However, although they can learn from phonetic knowledge, they cannot make good use of this knowledge for error correction. In this paper, we propose a PinYin language model Guided Correction model(PYGC), which regards the Chinese pinyin sequence as an independent language model. Our model builds on two parallel transformer encoders to capture pinyin and semantic features respectively, with a late fusion module to fuse these two hidden representations to generate the final prediction. Besides, we perform an additional pronunciation prediction task on pinyin embeddings to ensure the reliability of the pinyin language model. Experiments on the widely used SIGHAN benchmark and a newly released CSCD-IME dataset with mainly pinyin-related errors show that our method outperforms current state-of-the-art approaches by a remarkable margin. Furthermore, isolation tests demonstrate that our model has the best generalization ability on unseen spelling errors.
Lateral Interactions Spiking Actor Network for Reinforcement Learning
ABSTRACT. Spiking neural network (SNN) has been shown to be a biologically plausible and energy efficient alternative to Deep Neural Network (DNN) in Reinforcement Learning (RL). In the prevailing SNN models for RL, fully-connected architectures with inter-layer connections are commonly employed, while the incorporation of intra-layer connections is neglected, thereby impeding the feature representation and information processing capacities of SNN in the context of reinforcement learning. To address these limitations, we propose a high-performance Lateral Interactions Spiking Actor Network (LISAN) to improve decision-making in reinforcement learning tasks. Our LISAN integrates lateral interactions between neighboring neurons into the spiking neuron membrane potential equation. Moreover, recognizing the significance of residual potentials in preserving valuable information within biological neurons, we incorporate soft reset mechanism to enhance model's functionality. To verify the effectiveness of our proposed framework, LISAN is evaluated using four continuous control tasks from OpenAI gym as well as different encoding methods. The results show that LISAN substantially achieves better performance compared to state-of-the-art models. We hope that our work will contribute to a deeper understanding of the mechanisms involved in information capturing and processing within the brain.
MDAM: Multi-Dimensional Attention Module for Anomalous Sound Detection
ABSTRACT. Unsupervised anomaly sound detection (ASD) is a challenging task that involves training a model to differentiate between normal and abnormal sounds in an unsupervised manner. The difficulty of the task increases when there are acoustic differences (domain shift) between the training and testing datasets. To address these issues, this paper proposes a state-of-the-art ASD model based on self-supervised learning. Firstly, we designed an effective attention module called the MultiDimensional Attention Module (MDAM). Given a shallow feature map of sound, this module infers attention along three independent dimensions: time, frequency, and channel. It focuses on specific frequency bands that contain discriminative information and time frames relevant to semantics, thereby enhancing the representation learning capability of the network model. MDAM is a lightweight and versatile module that can be seamlessly integrated into any CNN-based ASD model. Secondly, we propose a simple domain generalization method that increases domain diversity by blending the feature representations of different domain data,
thereby mitigating domain shift. Finally, we validate the effectiveness of
the proposed methods on DCASE 2022 Task 2 and DCASE 2023 Task
2.
A Reinforcement Learning Method for Generating Class Integration Test Orders Considering Dynamic Couplings
ABSTRACT. In recent years, with the rapid development of artificial intelligence, reinforcement learning has made significant progress in various fields. However, there are still some challenges when applying reinforcement learning to solve problems in software engineering. The generation of class integration test orders is a key challenge in object-oriented program integration testing. Previous research mainly focused on static couplings and neglected dynamic couplings, leading to inaccurate cost measurement of class integration test orders. In this paper, we propose a reinforcement learning method to generate class integration test orders considering dynamic couplings. Firstly, the concept of dynamic couplings generated by polymorphism is introduced, and a strategy for measuring the stubbing complexity of simulating dynamic dependencies is proposed. Then, we combine this new stubbing complexity with a reinforcement learning method to generate class integration test orders and achieve the optimal result with minimal overall stubbing complexity. Comprehensive experiments show that our proposed approach outperforms other methods in measuring the cost of class integration test orders.
LDW-RS Loss: Label Density-Weighted Loss with Ranking Similarity Regularization for Imbalanced Deep Fetal Brain Age Regression
ABSTRACT. Estimation of fetal brain age based on sulci by magnetic resonance imaging (MRI) is crucial in determining the normal development of the fetal brain. Deep learning provides a possible way for fetal brain age estimation from MRI. However, real-world MRI datasets often present imbalanced label distribution, resulting in the model tending to show undesirable bias towards the majority of labels. Thus, many methods have been designed for imbalanced regression. Nevertheless, most of them on handling imbalanced data focus on targets with discrete categorical indices, without considering the continuous and ordered nature of target values. To fill the research gap of fetal brain age estimation with imbalanced data, we propose a novel label density-weighted loss with a ranking similarity regularization (LDW-RS) for deep imbalanced regression of the fetal brain age. Label density-weighted loss is designed to capture information about the similarity between neighboring samples in the label space. Ranking similarity regularization is developed to establish a global constraint for calibrating the biased feature representations learned by the network. A total of 1327 MRI images from 157 healthy fetuses between 22 and 34 weeks were used in our experiments for the fetal brain age estimation regression task. In the random experiments, our LDW-RS achieved promising results with an average mean absolute error of 0.760±0.066 weeks and an R-square (R^2) coefficient of 0.914±0.020. Our fetal brain age estimation algorithm might be useful for identifying abnormalities in brain development and reducing the risk of adverse development in clinical practice.
A Corpus of Quotation Element Annotation for Chinese Novels: Construction, Extraction and Application
ABSTRACT. Quotations or dialogues are important for literary works, like novels. In the famous Jinyong's novels, about a half of all sentences are quotations. Quotation elements like speaker, speech mode, speech cue and the quotation itself are very useful to the analysis of fictional characters. To build models for automatic quotation element extraction, we construct the first quotation corpus with annotation of all the four quotation elements, and the corpus size of 31,922 quotations is the largest to our knowledge. Based on the corpus, we propose baseline models for quotation element extraction and conduct extensive experiments. To explore the application of extracted quotation elements, we carry out character classification and gender classification, and find out that quotation and speech mode are effective for the two tasks. We will extend our work from Jinyong's novels to other novels to analyze various characters from different angles based on quotation structures.
A lightweight safety helmet detection network based on bidirectional connection module and Polarized Self-Attention
ABSTRACT. Safety helmets worn by construction workers in substations
can reduce the accident rate in construction operations. With the mature
development of smart grid and target detection technology, automatic
monitoring of helmet wearing by using the cloud-side collaborative approach is of great significance in power construction safety management.
However, existing target detectors have a large number of redundant
calculations in the process of multi-scale feature fusion, resulting in additional computational overhead for the detectors. To solve this problem,
we propose a lightweight target detection model PFBDet. First, we design cross-stage local bottleneck module FNCSP, and propose an efficient
lightweight feature extraction network PFNet based on this combined
with Polarized Self-Attention to optimize the computational complexity while obtaining more feature information. Secondly, to address the
redundancy overhead brought by multi-scale feature fusion, we design
a bidirectional connection module BCM (Bidirectional connection module) based on GSConv and lightweight upsampling operator CARAFE,
and propose an efficient multi-scale feature fusion structure BCM-PAN
based on this combined with single aggregation cross-layer network module. To verify the effectiveness of the method, we conducted extensive
experiments on helmet image datasets such as Helmeted, Ele-hat and
SHWD, and the experimental results show that the proposed method
has better recognition accuracy with less computational effort. And it
is higher than most high-performance target detectors, which can meet
the real-time detection of construction personnel wearing helmets in the
construction scenarios of power systems.https://github.com/csworkcode/PFBDet
Decoupling Style from Contents for Positive Text Reframing
ABSTRACT. The positive text reframing (PTR) task, where the goal is to generate a text that gives a positive perspective to a reader with preserving the original sense of the input text, has attracted considerable attention as one of the major applications of natural language generation (NLG). In the PTR task, large annotated pairs of datasets are not available and would be expensive and time-consuming to create. Therefore, how to interpret a diversity of contexts and generate a positive perspective from a small size of the training dataset is still an open problem. In this work, we propose a simple but effective Framework for Decoupling the sentiment Style from the Contents of the text (FDSC) for the PTR task. Different from the previous work on the PTR task that utilizes Pre-trained Language Models (PLM) to directly fine-tune the task-specific labeled dataset such as Positive Psychology Frames (PPF),
our FDSC fine-tunes the model for the input sequence with two special symbols to decouple style from the contents. We apply a contrastive learning strategy to enhance the model that learns a more robust contextual representation with preserving the original meaning of the given input sequence. The experimental results on the PPF dataset, show that our approach outperforms baselines by fine-turning two popular Seq2Seq PLMs, BART and T5, and can achieve better text reframing. Our codes are available online.
Segment Anything Model for Semi-Supervised Medical Image Segmentation via selecting reliable pseudo-labels
ABSTRACT. Semi-Supervised Learning (SSL) has become a hot topic due to its less dependence on annotated data compared to fully supervised methods. This advantage is more evident in the field of medical imaging, where acquiring labeled data is difficult. Generating pseudo-labels for unlabeled images is one of the most classic and intuitive methods in semi-supervised segmentation. However, this method may also introduce unreliable pseudo-labels that provide incorrect guidance to the model and impair its performance. The lack of ground truth labels for unlabeled images makes it hard to evaluate the reliability of pseudo-labels. In this paper, a SSL framework was presented, in which we proposed a simple but effective strategy to select reliable pseudo-labels by leveraging the Segment Anything Model (SAM) for segmentation. Concretely, the SSL model trained with domain knowledge provides the generated pseudo-labels to the SAM as prompts. Reliable pseudo-labels usually make SAM conduct predictions consistent with the semi-supervised segmentation model. Based on this result, the reliable pseudo-labels are selected to further boost the existing semi-supervised learning methods. The experimental results show that the proposed strategy effectively improves the performance of different algorithms in the semi-supervised scenarios. On the publicly available ACDC dataset, the proposed method achieves 6.84% and 10.76% improvement over the advanced two baselines respectively on 5% of labeled data. The extended experiments on pseudo-labels verified that the quality of the selected reliable pseudo-labels by the proposed strategy is superior to that of the unreliable pseudo-labels. This study may provide a new avenue for SSL medical image segmentation.
Multi-level Feature Enhancement Method For Medical Text Detection
ABSTRACT. In recent years, significant progress has been made in the
field of text detection. Segmentation-based text detection methods have
been widely applied in the field of text detection. However, when it
comes to tasks involving text-dense, extreme aspect ratios, and multi-oriented
text, the current detection methods still fail to achieve satisfactory
performance. In this paper, we propose an efficient and accurate
text detection system that incorporates an efficient segmentation
module and a learnable post-processing method.More specifically, the
segmentation head consists of an Efficient Feature Enhancement Module
(EFEM) and a Multi-Scale Feature Fusion module(MSFM). The
EFEM is a cascaded U-shaped module that incorporates spatial attention
mechanism to introduce multi-level information for improved
segmentation performance. MSFM integrates features from the EFEM
at different depths and scales to generate the final features for segmentation.
Moreover, a post-processing module employing a differentiable
binarization approach is utilized, allowing the segmentation network
to adaptively set the binarization threshold. This simplifies the
post-processing workflow and enhances the performance of text detection.
The proposed model demonstrates excellent performance on medical
text image datasets. Multiple benchmark experiments further validate
the superior performance of the proposed method. Code is available at:
https://github.com/csworkcode/EFDBNet
DKCS: A Dual Knowledge-Enhanced Abstractive Cross-Lingual Summarization Method based on Graph Attention Networks
ABSTRACT. Cross-Lingual Summarization (CLS) is the task to generate a summary in one language for an article in a different language. Previous studies on CLS mainly take pipeline methods or train an attention-based end-to-end model on translated parallel datasets.
However, over-length source information and non-one-to-one bilingual mappings make it a big challenge for models to correctly summarize and translate important information. To solve this problem, we propose a dual knowledge enhanced abstractive CLS model of a graph-encoder-decoder architecture. To enhance the summarization ability, it implement a Clue-Focused Graph Encoder that leverage the graph attention network to simultaneously learn inter-sentence structures and aggregate important information based on the guidance of extracted salient internal knowledge. Then it introduces a bilingual lexicon from external knowledge into the decoder with a attention layer to enhance the translation ability. In addition, we construct the first hand-written CLS dataset for evaluation.
Experimental results show that this method has stronger robustness for longer inputs and substantially outperforms the existing SOTA on both automatic and human evaluations.
A Stealth Security Hardening Method Based on SSD Firmware Function Extension
ABSTRACT. In recent years, issues related to information security have received increasing at-tention. Expanding the security-related functionality of SSD firmware can pro-vide an additional method for implementing security features in the host system while taking advantage of the excellent performance of the SSD controller. This paper proposes a stealth security hardening method based on SSD Firmware Function Extension. By reverse engineering the firmware program and inserting jump instructions at specific locations, the firmware program can jump to and ex-ecute the extension program inserted into the original unused space of the firm-ware. This can be done without affecting the normal use of the SSD, realizing the functional expansion of the firmware, which mainly includes executing remote code sent by the host, invoking timers, direct read and write flash memory, and self-destruction under specific circumstances. The availability of extended func-tions and the change in read and write performance after the expansion were ex-perimentally tested.
Aided diagnosis of autism spectrum disorder based on a mixed neural network model
ABSTRACT. The diagnosis of autism spectrum disorder (ASD) is a challenging task, especially for children. In order to determine whether a person has ASD or not, the conventional methods are questionnaires and behavioral observation, which may be subjective and cause misdiagnosis. In order to obtain an accurate diagnosis, we could explore the quantitative imaging biomarkers and leverage the machine learning to learn the classification model on these biomarkers for auxiliary ASD diagnosis. At present, many machine learning methods rely on resting-state fMRI data for feature extraction and auxiliary diagnosis. However, due to the heterogeneity of the data, there can be many noisy features that are adverse to diagnosis, and a lot of biometric information may be not fully explored. In this study, we designed a mixed neural network model of convolutional neural network (CNN) and graph neural network (GNN), termed as MCG-Net, to extract discriminative information from the brain functional connectivity based on the resting-state fMRI data. We used the F-score and KNN algorithms to remove the abundant connectivities in the functional connectivity matrix from global and local level. Besides, the brain gradient features were first introduced in the model. A datasets of 848 subjects from 17 sites on ABIDE datasets was adopted to evaluate the methods. The proposed method has achieved better diagnostic performance compared with other existing methods, with 4.56% improvement in accuracy.
A Joint Identification Network for Legal Event Detection
ABSTRACT. Detecting legal events is a crucial task in legal intelligence that involves identifying event types related to trigger word candidates in legal cases. While many studies have focused on using syntactic and relational dependencies to improve event detection, they often overlook the unique connections between trigger word candidates and their surrounding features. This paper proposes a new event detection model that addresses this issue by incorporating an initial scoring module to capture global information and a feature extraction module to learn local information. Using these two structures, the model can better identify event types of trigger word candidates. Additionally, adversarial training enhances the model's performance, and a sentence-length mask is used to modify the loss function during training, which helps mitigate the impact of missing trigger words. Our model has shown significant improvements over state-of-the-art baselines, and it won third prize in the event detection task at the Challenge of AI in Law 2022 (CAIL 2022).
A Supervised Spatio-Temporal Contrastive Learning Framework with Optimal Skeleton Subgraph Topology for Human Action Recognition
ABSTRACT. Human action recognition (HAR) is a hotspot in the field of computer vision, the models based on Graph Convolutional Network (GCN) show great advantages in skeleton-based HAR. However,most existing GCN based methods do not consider the diversity of action trajectories, and not highlight the key joints. To address these issues, a supervised spatio-temporal contrastive learning framework with optimal skeleton subgraph topology for HAR (SSTCL-optSST) is proposed. SSTCL-optSST uses the samples with the same lablel as the target action (anchor) to build a positive sample set, each of them represents a trajectory of an action. The sample set is used to design a loss function to guide the model recognize different poses of the action. Furthermore, the subgraphs of an original skeleton graph are used to construct a skeleton subgraph topology space, each subgraph in it is evaluated, and the optimal one is selected to highlight the key joints. Extensive experiments have been conducted on NTU RGB+D 60 and Kinetics datasets, the results show that our model has competitive performance.
A Hip-Knee Joint Coordination Evaluation System in Hemiplegic Individuals Based on Cyclogram Analysis
ABSTRACT. Inter-joint coordination analysis can provide deep insights into assessing patients' walking ability. This paper developed a hip-knee joint coordination assessment system.
Firstly, we introduced a hip-knee joint cyclogram generation model that takes into account walking speed. This model serves as a reference template for identifying abnormal patterns in the hip-knee joints when walking at different speeds.
Secondly, we developed a portable motion capture platform based on stereovision technology. It uses near-infrared cameras and markers to accurately capture kinematic data of the human lower limb.
Thirdly, we designed a hip-knee joint coordination assessment metric DTW-ED (Dynamic Time Wrapping - Euclidean Distance), which can score the subject's hip-knee joint coordination.
Experimental results indicate that the hip-knee joint cyclogram generation model has an error range of [0.78$^{\circ}$, 1.08$^{\circ}$].
We conducted walking experiments with five hemiplegic subjects and five healthy subjects. The evaluation system successfully scored the hip-knee joint coordination of patients, allowing us to differentiate between healthy individuals and hemiplegic patients. This assessment system can also be used to distinguish between the affected and unaffected sides of hemiplegic subjects.
In conclusion, the hip-knee joint coordination assessment system developed in this paper has significant potential for clinical disease diagnosis.
A Hybrid Approach Using Convolution and Transformer for Mongolian Ancient Documents Recognition
ABSTRACT. Mongolian ancient documents are an indispensable source for studying Mongolian traditional culture. To thoroughly explore and effectively utilize these ancient documents, conducting a comprehensive study on Mongolian ancient document words recognition is essential. In order to better recognize the word images in Mongolian ancient documents, this paper proposes an approach that combines convolutional neural networks with Transformer models. The approach used in this paper takes word images as the input for the model. After passing through a feature extractor composed of convolutional neural networks, the extracted features are fed into a Transformer model for prediction. Finally, the corresponding recognition results of the word images are obtained. Due to the common existence of imbalanced distribution of character classes in recognition tasks, models often tend to excessively focus on common characters while neglecting rare characters. Our proposed approach integrates focal loss to enhance the model's attention towards rare characters, thereby improving the overall recognition performance of the model for all characters. After training, the model is capable of rapidly and efficiently performing end-to-end recognition of words in Mongolian ancient documents. The experimental results indicate that our proposed approach outperforms existing methods for word recognition in Mongolian ancient documents, effectively improving the performance of Mongolian ancient document words recognition.
Multi-Scale Feature Fusion Neural Network for Accurate Prediction of Drug-Target Interactions
ABSTRACT. Identification of drug-target interactions (DTI) is crucial in drug discovery and repositioning. However, identifying DTI is a costly and time-consuming process that involves conducting biological experiments with a vast array of potential compounds. To accelerate this process, computational methods have been developed, and with the growth of available datasets, deep learning techniques have been widely applied in this field. Despite numerous deep learning prediction models proposed for DTI prediction, many of them have several limitations. Firstly, they extract features solely from the complete protein sequence, which makes it challenging to utilize its information effectively. Secondly, they lack efficient fusion of drug and target features, which is essential for reflecting their interaction relationship. Additionally, many approaches solely treat DTI as a binary classification problem, neglecting the significance of binding affinity prediction, which reflects the strength of the interaction between drugs and targets. To address these limitations, we developed a multi-scale feature fusion neural network (MSF-DTI), which leverages the potential semantic information of amino acid sequences at multiple scales, enriches the feature representation of proteins, and fuses drug and target features using a designed feature fusion module for predicting drug-target interactions. According to experimental results, MSF-DTI outperforms other state-of-the-art methods in both DTI classification and binding affinity prediction tasks.
Incomplete Multi-view Subspace Clustering Using Non-Uniform Hyper-Graph for High-Order Information
ABSTRACT. Incomplete multi-view subspace clustering (IMSC) is intended to exploit the information of multiple incomplete views to partition data into their intrinsic subspaces. Existing methods try to exploit the high-order information of data to improve the clustering performance, many tools are used such as tensor factorization and hyper-Laplacian regularization. Compared to using complex mathematical tools to solves problems, why not considering to get considerable improvements through some simple ways? To address this issue, we propose an incomplete multi-view subspace clustering method using non-uniform hyper-graph (NUHG-IMSC) method which makes slightly change to the usual way of constructing uniform hyper-graph. We find a set of data points that have high similarity with the center point of each hyper-edge in high-dimensional space to be its neighbor samples, the cardinality of each hyper-edge is decided based on the distribution of the corresponding center point. This is a simple but effective way to utilize high-order information without bringing computational burden and extra parameters. Besides the advantage that the partial samples can be reconstructed more reasonably, our method also brings benefits to other parts in the whole framework of IMSC, such as learning the view-specific affinity matrices, the weight of each view, and the unified affinity matrix. Experimental results on three multi-view datasets indicate the effectiveness of the proposed method.
Hazardous Driving Scenario Identification with Limited Training Samples
ABSTRACT. Extracting hazardous driving scenarios from naturalistic driving data is essential for creating a comprehensive test scenario library for autonomous driving systems. However, due to the sporadic and low-probability nature of hazardous driving events, the available number of hazardous driving scenarios collected within a short period is limited, resulting in insufficient scenario coverage and a lack of diverse training samples. Such limited samples also lead to the poor generalization ability and unexpected robustness of the hazardous driving scenario identification model. To address these challenges, we propose a method to augment the limited driving scenario data through generative adversarial networks. By integrating DIG and APA techniques into the original GAN framework, the quality of the generated driving scenarios is enhanced. Benefiting from the advantages of augmented samples, we can leverage a more sophisticated ResNet architecture for feature extraction from compressed motion profiles to identify hazardous driving scenarios. By incorporating the augmented samples into the training process of the hazardous driving scenario identification model, we achieve a 4% improvement in the AUC of the model, surpassing the existing state-of-the-art methods.
Vessel Behavior Anomaly Detection using Graph Attention Network
ABSTRACT. Vessel behavior anomaly detection is of great significance for ensuring navigation safety, combating maritime crimes, and maritime management. Unfortunately, most current researches ignore the importance of temporal dependencies and correlations between ship features. We propose a novel vessel behavior anomaly detection using graph attention network (i.e., VBAD-GAN) framework, which characterizes these complicated relationships and dependencies through a graph attention module that consists of a time graph attention module and a feature graph attention module. We also adopt a process of graph structure learning to obtain the correct feature graph structure. Moreover, we propose a joint detection strategy combining reconstruction and prediction modules to capture the local ship features and long-term relationships between ship features. We demonstrate the effectiveness of the graph attention module and the joint detection strategy through the ablation study. In addition, the comparative experiments with three baselines, including the quantitative analysis and visualization, show that VBAD-GAN outperforms all other baselines.
Machine Unlearning with Affine Hyperplane Shifting and Maintaining for Image Classification
ABSTRACT. Machine unlearning enables a well-trained model to forget
certain samples in the training set while keeping its performance on the
remaining samples. Existing algorithms based on decision boundary require to generate adversarial samples in input space in order to obtain
the nearest but incorrect class labels. However, due to the nonlinearity of
the decision boundary in input space, multiple iterations are needed to
generate the adversarial samples, and the generated adversarial samples
are affected by the bound of noise in adversarial attack, which greatly
limits the speed and efficiency of unlearning. In this paper, a machine
unlearning method with affine hyperplane shifting and maintaining is
proposed for image classification, in which the nearest but incorrect class
labels are directly obtained with the distance from the point to the hyperplane without generating adversarial samples for boundary shifting.
Moreover, knowledge distillation is leveraged for boundary maintenance.
Specifically, the output of the original model is decoupled into remaining class logits and forgetting class logits, and the remaining class logits
is utilized to guide the unlearn model to avoid catastrophic forgetting.
Our experimental results on CIFAR-10 and VGGFace2 have demonstrated
that the proposed method is very close to the retrained model in terms
of classification accuracy and privacy guarantee, and is about 4 times
faster than Boundary Shrink.
An Interpretable Vulnerability Detection Method Based on Multi-task Learning
ABSTRACT. Vulnerability detection (VD) techniques are critical to software security and have been widely studied. Many recent research works have proposed VD approaches built with deep learning models and achieved state-of-the-art performance. However, due to the black-box characteristic of deep learning, these approaches typically have poor interpretability, making it challenging for analysts to understand the causes and mechanisms behind vulnerabilities. Although a few strategies have been presented to improve the interpretability of deep learning models, their outputs are still difficult to understand for those with little machine learning knowledge. In this study, we propose IVDM, an Interpretable Vulnerability Detection Framework Based on Multi-task Learning. IVDM integrates the VD and explanation generation tasks into a multi-task learning mechanism. It can generate explanations of the detected vulnerabilities in the form of natural language while per forming the VD task. Compared with existing methods, the explanations outputted by IVDM are easier to understand. Moreover, IVDM is trained based on a large-scale pre-trained model, which brings it the cross-programming-language VD ability. Experimental results conducted on both a dataset collected by ourselves and public datasets have demonstrated the effectiveness and rationality of IVDM.
Learning Primitive-aware Discriminative Representations for Few-shot Learning
ABSTRACT. Few-shot learning (FSL) aims to learn a classifier that can be easily adapted to recognize novel classes with only a few labeled examples. Recently, some works about FSL have yielded promising classification performance, where the image-level feature is used to calculate the similarity among samples for classification. However, the image-level feature ignores abundant fine-grained and structural information of objects that could be transferable and consistent between seen and unseen classes. How can humans easily identify novel classes with several sam-ples? Some studies from cognitive science argue that humans recognize novel categories based on primitives. Although base and novel categories are non-overlapping, they share some primitives in common. Inspired by above research, we propose a Primitive Mining and Reasoning Network (PMRN) to learn primitive-aware representations based on metric-based FSL model. Concretely, we first add Self-supervision Jigsaw task (SSJ) for feature extractor parallelly, guid-ing the model encoding visual pattern corresponding to object parts into feature channels. Moreover, to mine discriminative representations, an Adaptive Channel Grouping (ACG) method is applied to cluster and weight spatially and semantically related visual patterns to generate a set of visual primitives. To further enhance the discriminability and transferability of primitives, we propose a visual primitive Correlation Reasoning Network (CRN) based on graph convolutional network to learn abundant structural information and internal correlation among primitives. Finally, a primitive-level metric is conducted for classification in a meta-task based on episodic training strategy. Extensive experiments show that our method achieves state-of-the-art results on six standard benchmarks.
FEDEL: Frequency Enhanced Decomposition and Expansion Learning for Long-Term Time Series Forecasting
ABSTRACT. Long-term Time Series Forecasting(LTSF) is widely used in various fields, for example, power planning. LTSF requires models to capture the long-term dependent coupling in time series effectively. However, several problems hinder the predictive performance of existing models, including the inability to exploit the correlation dependence in time series fully, the difficulty in decoupling the complex cycles of the series in the time domain, and the error accumulation of iterative multi-step prediction. To address these issues, we designed a Frequency Enhanced Decompose Expansion Learning(FEDEL) model for LTSF. The model has a linear complexity with three distinguishing features: (i) an extensive capacity depth regime that can effectively capture complex dependencies in long time series. (ii) Decoupling of complex cycles using sparse representation of time series in the frequency domain. (iii) A direct multi-step prediction strategy is used to generate the prediction series, which can improve the prediction speed and avoid error accumulation. We have conducted extensive experiments on eight real-world, large-scale datasets. The experimental results demonstrate that the FEDEL model performs significantly better than traditional methods and outperforms the current SOTA model in the field of LTSF in most cases.
GoatPose: A Lightweight and Efficient Network with Attention Mechanism
ABSTRACT. Keypoint detection is an essential part of human pose estimation. However, due to resource constraint, it's still a challenge to deploy complex convolutional networks to edge devices. In this paper, we present GoatPose: a lightweight deep convolutional model for real-time human keypoint detection incorporating attention mechanism. Since the high computational cost is associated with the frequently-use convolution block, we substitute it with LiteConv block, which conducts cheap linear operation to generate rich feature maps from the intrinsic features with low cost. This method significantly accelerates the model while inevitablely loses a part of spatial information. To compensate for the information loss, we introduce NAM attention mechanism. By applying channel weighting, the model can focus more on the important features and enhance the feature representation. Results on the COCO dataset show the superiority of our model. With the complexity of our model reduced by half and the computational speed doubled, the accuracy of our model is basically the same as that of the backbone model. We further deploy our model on NVIDIA Jetson TX2 to validate its real-time performance, indicating that our model is capable of being deployed and widely adopted in real-world scenarios.
Sign Language Recognition for Low Resource Languages using Few Shot Learning
ABSTRACT. Sign Language Recognition (SLR) with machine learning is challenging due to the scarcity of data for most low-resource sign languages. Therefore it is crucial to leverage a few-shot learning strategy for SLR. This research proposes a novel skeleton-based sign language recognition method using meta-learning and a prototypical network called ProtoSign. Furthermore, we contribute to the field by introducing the first publicly accessible dynamic word-level Sinhala Sign Language (SSL) video dataset comprising 1110 videos over 50 classes. To our knowledge, this is the first publicly available SSL dataset. Our method is evaluated using two low-resource language datasets, including our dataset. The experiments show the results in 95% confidence level for both 5-way and 10-way in 1-shot, 2-shot, and 5-shot settings. We have made our code and data publicly available.
Evaluation of football players' performance based on Multi-Criteria Decision Analysis approach and sensitivity analysis
ABSTRACT. The use of information systems and recommendation models in football, along with the development of the sport, has become a popular way to improve performance. With their help, it is possible to make more informed and effective decisions in terms of team management, the selection of training parameters, or the building of player line-ups. To this end, in this paper, we propose a decision model based on the Multi-Criteria Decision Analysis (MCDA) approach to assess defensive football players regarding the overall and defense skills. The model was examined with selected objective weighting techniques and MCDA methods to comprehensively analyze the potential footballers' performance scores. The sensitivity analysis is performed to indicate what factors of the game the players should be focusing on throughout the season to increase the evaluation score of their performance and, thus, be a more attractive choice to clubs' managers. The results from the sensitivity analysis show that improving the performance regarding particular criteria can significantly improve the evaluation score of the players.
A Multi-task Framework for Solving Multimodal Multiobjective Optimization Problems
ABSTRACT. In multimodal multiobjective optimization problems, there may have more than one Pareto optimal solutions corresponding to the same objective vector. The key is to find solutions converged and well-distributed. Even though the existing evolutionary multimodal multiobjective algorithms have taken both the distance in the decision space and objective space into consideration, most of them still focus on convergence property. This may omit some regions difficult to search in the decision space during the process of converging to the Pareto front. In order to resolve this problem and maintain the diversity in the whole process, we propose a differential evolutionary algorithm in a muti-task framework (MT-MMEA). This framework uses an ε-based auxiliary task only concerning the diversity in decision space and provides well-distributed individuals to the main task by knowledge transfer method. The main
task evolves using a non-dominated sorting strategy and outputs the final population as the result. MT-MMEA is comprehensively tested on two MMOP benchmarks and compared with six state-of-the-art algorithms. The results show that our algorithm has a superior performance in solving these problems.
Graph Contrastive ATtention Network for Rumor Detection
ABSTRACT. Detecting rumors from the vast amount of information in online social media has become a formidable challenge. Rumor detection based on rumor propagation tree benefits from crowd wisdom and has become an important research method for rumor detection. However, node representations in such methods rely on limited label information and lose a lot of node information when obtaining graph-level representations through pooling. In this paper, we propose a noval rumor detection model called Graph Attentive Self-Supervised Learning (GASSL). GASSL adopts a graph attention model as the encoder, and applies graph self-supervised learning without negative label pairs as an auxiliary task to update network parameters, and combines multiple pooling techniques to obtain the graph-level representation of the rumor propagation tree. To verify the effectiveness of our model, we conduct experiments on two real-world datasets. The GASSL model outperforms the optimal baseline algorithms on both datasets, proving the effectiveness of the proposed model.
E3-MG:End-to-End Expert Linking via Multi-Granularity Representation Learning
ABSTRACT. Expert linking is a task to link any mentions with their corresponding expert in a knowledge base (KB). Previous works that focused on explicit features did not fully exploit the fine-grained linkage inside of the expert work, which creates a serious semantic bias. Also, such models are more sensitive to specific experts resulting from the isolationism for class-imbalance instances. To address this issue, we propose E3-MG(End-to-End Expert Linking via Multi-Granularity Representation Learning), a unified multi-granularity learning framework, we adopt a cross-attention module perceptively mining mutual information to highlight the expression of masterpieces or key attributes and a multi-objective learning process that integrates contrastive learning and knowledge distillation method is designed to optimize coherence between
experts via document-level coherence. E3-MG enhances the representation capability of diverse characteristics of experts and demonstrates good generalizability. We evaluate E3-MG on KB and extern datasets, and our method achieves state-of-the-art (SOTA).
Infrared Target Recognition Technology Based On Small Sample Learning
ABSTRACT. Aiming at the problems of sparse sample data and high difficulty of embedded implementation under the constraint of limited resources in the military application of infrared target recognition, this paper proposes a lightweight target recognition technology based on less sample learning. This technology improves the structure of generator and discriminator network by designing the cyclic generation countermeasure network model, and realizes the migration from visible image to infrared image, so as to achieve the purpose of expanding the training data; Through the improvement of yolov5s network, the recognition accuracy is improved without reducing the magnitude of model parameters, and the characteristics of high real-time processing are retained. The experimental results show that the generated countermeasure network model designed based on this project can process the visible image and generate the near-infrared image. After adding the training set, the model accuracy is effectively improved; The improved yolov5s model is 2% higher than the original model map0.5, and is easier to be embedded.
Weakly Supervised Temporal Action Localization Through Segment Contrastive Learning
ABSTRACT. Weakly-supervised temporal action localization aims to learn localizing action with only video-level labels. Traditional methods mainly consider the snippets contributing most to the video-level classification as actions. In general, they process each snippet individually, thus ignoring both relationship between different action snippets and the productive temporal contextual relationships which are critical for action localization. In this paper, we present a perspective that learning by comparing not only helps identify different actions but also helps separate the action from the background. Specifically, we propose BEMA(Bidirectional Exponential Moving Average) to utilize contextual information to obtain more stable feature representation. In addition to that, we introduce Inter-Segment Loss to refine the snippet representation in feature space to prevent misidentification of similar actions for accurate action classification, and Intra-Segment Loss to separate action from background in feature space to locate precise temporal boundaries.
Unsupervised Joint-Semantics Autoencoder Hashing for Multimedia Retrieval
ABSTRACT. Cross-modal hashing has emerged as a prominent approach for large-scale multimedia information retrieval, offering advantages in computational speed and storage efficiency over traditional methods. However, unsupervised cross-modal hashing methods still face challenges in the lack of practical semantic labeling guidance and handling of cross-modal heterogeneity. In this paper, we propose a new unsupervised cross-modal hashing method called Unsupervised Joint-Semantics Autoencoder Hashing(UJSAH) for multimedia retrieval. First, we introduce a joint-semantics similarity matrix that effectively preserves the semantic information in multimodal data. This matrix integrates the original neighborhood structure information of the data, allowing it to capture the associations between different modalities better. This ensures that the similarity matrix can accurately mine the underlying relationships within the data. Second, we design a dual prediction network-based autoencoder, which implements the interconversion of semantic information from different modalities and ensures that the generated binary hash codes maintain the semantic information of different modalities. Experimental results on several classical datasets show a significant improvement in the performance of UJSAH in multimodal retrieval tasks relative to existing methods. The experimental code is published at https://github.com/YunfeiChenMY/UJSAH.
TransCenter: Transformer in Heatmap and A New Form of Bounding Box
ABSTRACT. In current heatmap-based object detection, the task of heatmap is to predict the position of keypoints and its category. However, since objects of the same cate-gory share the same channel in the heatmap, it is possible for their keypoints to overlap. When this phenomenon occurs, existing heatmap-based detectors are un-able to differentiate between the overlapping keypoints. To address the above is-sue, we have designed a new heatmap-based object detection model, called TransCenter. Our model decouples the tasks of predicting the object category and keypoint position, and treats object detection as a set prediction task. We use a la-bel assignment strategy to divide the predicted sets into positive and negative samples for training. The purpose of this is to allow different objects to have their own heatmap channel without sharing with other, thereby completely eliminating the occurrence of overlapping. To make the model easier to learn, we leverage the characteristic that heatmaps can reduce the solution space, proposed a novel ap-proach for predicting bounding boxes. We use the encoder-decoder structure in transformers, treat the prediction of bounding boxes as an encoding task, use the form of a heatmap to represent the position and size. Then, we treat category pre-diction and offset prediction of the bounding box as decoding tasks, where the offset prediction is outputted through regression.
An Approach to Mongolian Neural Machine Translation Based on RWKV Language Model and Contrastive Learning
ABSTRACT. Low-resource machine translation (LMT) is a challenging task, especially for languages with limited resources like Mongolian. In this paper, we propose a novel Mongolian-to-Chinese machine translation approach based on the RWKV language model and augmented with contrastive learning. Traditional methods that perturb the embedding layer often suffer from issues such as semantic distortion and excessive perturbation, leading to training instability. To address these problems, we introduce a contrastive learning approach combined with adversarial perturbation. Additionally, the RWKV language model, as a new architecture, has shown to be more efficient in terms of training and inference time compared to traditional transformer models in various natural language processing tasks. In this work, we employ the RWKV language model as the core of our machine translation model. We evaluate our approach on a benchmark dataset of Mongolian-to-Chinese parallel sentences. The experimental results demonstrate that our method outperforms the state-of-the-art approaches in Mongolian machine translation. Furthermore, our research indicates that the proposed approach significantly mitigates the training instability caused by adversarial perturbation and demonstrates the effectiveness of employing the RWKV language model in improving the translation performance.
S-CGRU: An Efficient Model for Pedestrian Trajectory Prediction
ABSTRACT. In the development of autonomous driving systems, pedestrian trajectory prediction plays a crucial role. Existing models still face some challenges in capturing the accuracy of complex pedestrian actions in different environments and in handling large-scale data and real-time prediction efficiency. To address this, we have designed a novel Complex Gated Recurrent Unit (CGRU) model, cleverly combining the spatial expressiveness of complex numbers with the efficiency of Gated Recurrent Unit networks to establish a lightweight model. Moreover, we have incorporated a social force model to further develop a Social Complex Gated Recurrent Unit (S-CGRU) model specifically for predicting pedestrian trajectories. To improve computational efficiency, we conducted an in-depth study of the pedestrian's attention field of view in different environments to optimize the amount of information processed and increase training efficiency. Experimental verification on six public datasets confirms that S-CGRU model significantly outperforms other baseline models not only in prediction accuracy but also in computational efficiency, validating the practical value of our model in pedestrian trajectory prediction.
Enhanced Generation of Human Mobility Trajectory with Multiscale model
ABSTRACT. Over the past three years, the COVID-19 pandemic has highlighted the importance of understanding how people travel in contemporary urban areas in order to produce high-quality policies for public health emergencies. In this paper, we introduce a multiscale generative model called MScaleGAN that generates human mobility trajectories. Unlike existing models where both location and time were discretized, resulting in generated results that were concentrated on certain points, MScaleGAN can produce trajectories with higher detail for better capturing urban road systems spatially and human behaviors’ irregularity temporally. Experimental results show that our method generates better performance than previous models based on distribution similarities of individual and collective metrics compared with real GPS trajectories. Furthermore, we study the application of MScaleGAN on COVID-19 spread simulation and find that the spreading process under generated trajectories is similar to that under real data.
TKGR-RHETNE: A New Temporal Knowledge Graph Reasoning Model via Jointly Modeling Relevant History Event and Temporal Neighborhood Event Context
ABSTRACT. However, these approaches are unable to predict when an event will occur and exhibit the following limitations: (1) inference methods heavily rely on the recurrence or periodicity of events and disregard unobserved latent factors; (2) when aggregating information in a graph, they implicitly assume homogeneity; 3) there is information loss caused by long-term evolutionary processes. In response to these limitations, we propose a novel Temporal Knowledge Graph Completion Method with Transformer Hawkes Process and Random Walk Aggregation Strategy (TemT) for both TKG entity prediction and future time prediction tasks. In TemT, there are two main components that capture time evolution information based on historical snapshots and instantaneous structure information based on random walk strategy, respectively, facilitating the feature evolution of the Hawkes process. Comprehensive experiments on five public datasets demonstrate the superior performance of our proposed method.
SRLI:Handling Irregular Time Series with a Novel Self-supervised Model based on Contrastive Learning
ABSTRACT. The advancement of sensor technology has made it possible to use more sensors to monitor industrial systems, resulting in a large amount of irregular, unlabeled time-series data. Consequently, a large volume of irregular and unlabeled time series data is produced. Learning appropriate representations for those series is a very important but challenging task. This paper presents a self-supervised representation learning model SRLI(Self-supervised Representation Learning for Irregularities). We use T-LSTM to construct the irregularity encoder block. Based on this, we design three data augmentation methods. First, the raw time-series data are transformed into different yet correlated views. Second, we propose a contrasting module to learn robust representations. Lastly, to further learn discriminative representations, we reconstruct the series and try to get the imputation values of the unobserved positions. Rather than in a two-stage manner, our framework can generate the instance-level representation for ISMTS directly. Experiments show that this model has good performance on multiple data sets.
ABSTRACT. Currently, research on events mainly focuses on the task of event extraction using text, which aims to extract trigger words and arguments from text and is a fine-grained classification task. Although some researchers have improved the event extraction task by additionally constructing external image datasets, these images do not come from the original source of the text and cannot used for detecting real-time events. To detect events in multimodal data on social media, we propose a new multimodal approach which utilizes text-image pairs for event classification. Our model uses a unified language pre-trained model CLIP to obtain feature representations of text and images, and builds a Transformer encoder as a fusion module to achieve interaction between modalities, thereby obtaining a good multimodal joint representation. During pre-training, the CLIP model can establish correlations between modalities through the guidance of contrastive learning objectives, resulting in unimodal representations with good generalization ability. Experimental results show that the proposed model outperforms several state-of-the arts.
ADV-POST: Physically Realistic Adversarial Poster for Attacking Semantic Segmentation Models in Autonomous Driving
ABSTRACT. In recent years, deep neural networks have gained significant popularity in real-time semantic segmentation tasks, particularly in the domain of autonomous driving. However, these networks are susceptible to adversarial examples, which pose a serious threat to the safety of autonomous driving systems. Existing adversarial attacks on semantic segmentation models primarily focus on the digital space and lack validation in real-world scenarios, or they generate meaningless and visually unnatural examples. To address this gap, we propose a method called Adversarial Poster (ADV-POST), which generates physically plausible adversarial patches to preserve semantic information and visual naturalness by adding small-scale noise to posters. Specifically, we introduce a dynamic regularization method that balances the effectiveness and intensity of the generated patches. Moreover, we conduct comprehensive evaluations of the attack effectiveness in both digital and physical environments. Our experimental results demonstrate the successful misguidedness of real-time semantic segmentation models in the context of autonomous driving, resulting in inaccurate semantic segmentation results.
Three-dimensional rotation knowledge representation learning based on graph context
ABSTRACT. The goal of knowledge graph representation learning is to map entities and relations into low-dimensional continuous vector Spaces in order to learn their semantic information representation. However, most existing models often struggle to model the basic features of knowledge graphs effectively, such as symmetric/antisymmetric, inverse, and combinatorial relational patterns. In addition, many models ignored the information about the neighborhood of entities in the triples in the graph. In order to solve these problems, this paper proposes a learning model of three-dimensional rotating knowledge graph representation based on graph context. The model first uses the quaternion mathematical framework to represent the entity as a set of vectors in three-dimensional space, and interprets the relationship as a three-dimensional rotation transformation between the entities. Then, by calculating the semantic similarity between entities and relations, the graph context information is fused into the vector representation. Experiments on public data sets FB15K-237 and WN18RR demonstrate the effectiveness of the proposed model. The experimental results show that the model can capture the relational pattern of knowledge graph better and make full use of the neighborhood information of entities in the graph.
ABSTRACT. Based on UNet, numerous outstanding image restoration models have been developed, and Uformer is no exception. The exceptional restoration performance of Uformer is not only attributable to its novel modules but also to the network’s greater depth. Increased depth does not always lead to better performance, but it does increase the number of parameters and the training difficulty. In this paper, we propose Uformer++, a reconstructed Uformer based on an efficient ensemble of UNets of varying depths that partially share an encoder and co-learn simultaneously under deep supervision. Our proposed new architecture has significantly fewer parameters than the vanilla Uformer, but still with promising results achieved. Considering that different channel-wise features contain totally different weighted information and so are pixel-wise features, a novel Nonlinear Activation Free Feature Attention (NAFFA) module combining Simplified Channel Attention (SCA) and Simplified Pixel Attention (SPA) is added to the model. The experimental results on various challenging benchmarks demonstrate that Uformer++ has the least computational cost while maintaining performance.
Exploring Efficient-Tuned Learning Audio Representation Method from BriVL
ABSTRACT. Recently, there has been an increase in the popularity of multimodal approaches in audio-related tasks, which involve using not only the audible modality but also textual or visual modalities in combination with sound.
In this paper, we propose a robust audio representation learning method WavBriVL based on Bridging-Vision-and-Language (BriVL).
It projects audio, image and text into a shared embedded space, so that multi-modal applications can be realized.
We tested it on some downstream tasks and presented the images rearranged by our method and evaluated them qualitatively and quantitatively. The main purpose of this article is to:
(1) Explore new correlation representations between audio and images;
(2) Explore a new way to generate images using audio.
The experimental results show that this method can effectively do a match on the audio image.
TextBFA: Arbitrary Shape Text Detection with Bidirectional Feature Aggregation
ABSTRACT. Scene text detection has achieved great progress recently, however, it is challenging to detect arbitrary shape text in the scene images with complex background, especially for those unobvious and long texts. To tackle this issue, we propose an effective text detection network, termed TextBFA, strengthening the text feature by aggregating high-level semantic features. Specifically, we first adopt a bidirectional feature aggregation network to propagate and collect information on feature maps. Then, we exploit a biliteral decoder with lateral connection to recover the low-resolution feature maps for pixel-wise prediction. Extensive experiments demonstrate the detection effectiveness of the proposed method on several benchmark datasets, especially on inconspicuous text detection.
ABSTRACT. Vision-Language Pre-training (e.g., CLIP) bridges image and language. Despite its great success in a wide range of zero-shot downstream tasks, it still underperforms in some abstract or systematic tasks, such as classifying the distance to the nearest car. Recently, however, some researchers found that vision-language pre-trained models are able to estimate monocular depth, and their performance even approaches that of some earlier fully-supervised methods. Given these conflicting findings, in this paper, we focus on the question - Can vision-language pre-trained models really understand depth? If so, how well does it perform? To answer these two questions, we propose MonoCLIP, which attempts to fully exploit the potential of vision-language pre-trained models by introducing three basic depth estimators and global context guided depth fusion. Results on two mainstream monocular depth estimation datasets demonstrate the ability of vision-language pre-trained model in understanding depth. Moreover, adequate ablation studies further shed light on why and how it works.
Remaining Useful Life Prediction of Control Moment Gyro in Orbiting Spacecraft based on Variational Autoencoder
ABSTRACT. For the telemetry data generated by the key components of spacecraft during the orbital operation contains a lot of degradation information, and these te-lemetry data have the characteristics of large data volume and high dimen-sionality which are difficult to process. In this paper, we present CMG-VAE, a variational autoencoder-based method for predicting the remaining useful life of control moment gyro in orbiting spacecraft. The method improves the structure of the variational autoencoder. In the encoding phase, the temporal convolutional network is used to extract time-dependent information from the telemetry data, while a graph representation learning approach is used to obtain structural information between the dimensions of the data. The final output of the encoding part is obtained by weighted fusion of the two parts of information using a feature fusion approach. One part of the newly fused features is fed into the decoder for data reconstruction and the other part is fed into the remaining life prediction module for remaining life prediction. To evaluate the effectiveness of the proposed method, this paper uses a set of control moment gyro data obtained from a space station and NASA's C-MAPSS simulation dataset for validation. The experimental results show that our proposed method achieves the best results compared to other state-of-the-art benchmarks. In particular, on the control moment gyro dataset, the root mean square error (RMSE) obtained by the method proposed in this paper is reduced by 24% compared to the best-performing baseline method.
MRRC: Multi-Agent Reinforcement Learning with Rectification Capability in Cooperative Tasks
ABSTRACT. Motivated by the centralised training with decentralised execution (CTDE) paradigm, multi-agent reinforcement learning (MARL) algorithms have made significant strides in addressing cooperative tasks. However, the challenges of sparse environmental rewards and limited scalability have impeded further advancements in MARL. In response, MRRC, a novel actor-critic-based approach is proposed. MRRC tackles the sparse reward problem by equipping each agent with both an individual policy and a cooperative policy, harnessing the benefits of the individual policy’s rapid convergence and the cooperative policy’s global optimality. To enhance scalability, MRRC employs a monotonic mix network to rectify the state value function Q for each agent, yielding the joint value function Q_tot to facilitate global updates of the entire critic network. Additionally, the Gumbel-Softmax technique is introduced to rectify discrete actions, enabling MRRC to handle discrete tasks effectively. By comparing MRRC with advanced baseline algorithms in the "Predator-Prey" and challenging "SMAC" environments, as well as conducting ablation experiments, the superior performance of MRRC is demonstrated in this study. The experimental results reveal the efficacy of MRRC in reward-sparse environments and its ability to scale well with increasing numbers of agents.
A neural network architecture for accurate 4D vehicle pose estimation from monocular images with uncertainty assesment
ABSTRACT. This paper proposes a new neural network architecture for estimating the four degrees of freedom poses of vehicles from monocular images in an uncontrolled environment. The neural network learns how to reconstruct 3D characteristic points of vehicles from image crops and coordinates of 2D keypoints estimated from these images. The 3D and 2D points are used to compute the vehicle pose solving the Perspective-n-Point problem, while the uncertainty is propagated by applying the Unscented Transform. Our network is trained and tested on the ApolloCar3D dataset, and we introduce a novel method to automatically obtain approximate labels for 3D points in this dataset. Our system outperforms state-of-the-art pose estimation methods on the ApolloCar3D dataset, and unlike competitors, it implements a full pipeline of uncertainty propagation.
Domain Generalization via Implicit Domain Augmentation
ABSTRACT. Deep convolutional neural networks often suffer significant performance degradation when deployed to an unknown domain. To tackle this problem, domain generalization (DG) aims to generalize the model learned from source domains to an unseen target domain. Prior work mostly focused on obtaining robust cross-domain feature representations, but neglecting the generalization ability of the classifier. In this paper, we propose a novel approach named Implicit Domain Augmentation (IDA) for classifier regularization. Our motivation is to prompt the classifier to see more diverse domains and thus become more knowledgeable. Specifically, the styles of samples will be transferred and re-equipped to original features. To obtain the direction of meaningful style transfer, we use the multivariate normal distribution to model the feature statistics. Then new styles are sampled from the distribution to simulate potential unknown domains. To efficiently implement IDA, we achieve domain augmentation implicitly by minimizing an upper bound of expected cross-entropy loss on the augmented feature set instead of generating new samples explicitly. As a plug-and-play technique, IDA can be easily applied to other DG methods and boost the performance, introducing negligible computational overhead. Experiments on several tasks demonstrate the effectiveness of our method.
Domain Generalized Object Detection with Triple Graph Reasoning Network
ABSTRACT. Recent advances in Domain Adaptive Object Detection (DAOD) have vastly restrained the performance degradation caused by distribution shift. However, DAOD relies on the strong assumption of accessible target domain during the learning procedure, which is tough to be satisfied in real-world applications. Domain Generalized Object Detection (DGOD) aims to generalize the detector trained on the source domains directing to an unknown target domain without accessing the target data. Thus it is a much more challenged problem and very few contributions have been reported. Extracting domain-invariant information is the key problem of domain generalization. Considering that the topological structure of objects does not change with the domain, we present a general DGOD framework, Triple Graph Reasoning Network (TGRN) to uncover and model the structure of objects. The proposed TGRN models the topological relations of foregrounds via building refined sparse graphs on both pixel-level and semantic-level. Meanwhile, a bipartite graph is created to capture structural consistency of instances across domain, implicitly enabling distribution alignment. Experiments on our newly constructed datasets verify the effectiveness of the proposed TGRN. Codes and datasets will be available soon.
SCME: A Self-Contrastive Method for Data-free and Query-Limited Model Extraction Attack
ABSTRACT. Previous studies have revealed that artificial intelligence (AI) systems are vulnerable to adversarial attacks. Among them, model extraction attacks fool the target model by generating adversarial examples on a substitute model. The core of such an attack is training a substitute model as similar to the target model as possible, where the simulation process can be categorized in a data-dependent and data-free manner. Compared with the data-dependent method, the data-free one has been proven to be more practical in the real world since it trains the substitute model with synthesized data rather than the actual data used to train the target model. However, the distribution of these fake data lacks diversity and cannot detect the decision boundary of the target model well, resulting in the dissatisfactory simulation effect. Besides, these data-free techniques need a vast number of queries to train the substitute model, increasing the time and computing consumption and the risk of exposure. To solve the aforementioned problems, in this paper, we propose a novel data-free model extraction method named SCME (Self-Contrastive Model Extraction), which considers both the inter- and intra-class diversity in synthesizing fake data. In addition, SCME introduces the Mixup operation to augment the fake data, which can explore the target model's decision boundary effectively and improve the simulating capacity. Extensive experiments show that the proposed method can yield diversified fake data compared to existing methods. Moreover, our method has shown superiority in many different attack settings under the query-limited scenario, especially for untargeted attacks, the SCME outperforms SOTA methods by 11.43\% on average for five baseline datasets.
Document-Level Relation Extraction with Relation Correlation Enhancement
ABSTRACT. Document-level relation extraction (DocRE) is a task that focuses on identifying relations between entities within a document. However, existing DocRE models often overlook the correlation between relations and lack a quantitative analysis of relation correlations. To address this limitation and effectively capture relation correlations in DocRE, we propose a relation graph method, which aims to explicitly exploit the interdependency among relations. Firstly, we construct a relation graph that models relation correlations using statistical co-occurrence information derived from prior relation knowledge. Secondly, we employ a re-weighting scheme to create an effective relation correlation matrix to guide the propagation of relation information. Furthermore, we leverage graph attention networks to aggregate relation embeddings. Importantly, our method can be seamlessly integrated as a plug-and-play module into existing models. Experimental results demonstrate that our approach can enhance the performance of multi-relation extraction, highlighting the effectiveness of considering relation correlations in DocRE.
CSEC: A Chinese Semantic Error Correction Dataset for Written Correction
ABSTRACT. Chinese text errors are typically divided into three categories: spelling, grammatical, and semantic. Existing research primarily focuses on addressing spelling and grammatical errors, which are errors at the single-word level errors. Chinese semantic error is a multi-word level error mainly caused by improper collocation of two or more words. Semantic errors are also commonly found in professional writing domains such as academic writing and publication. However, few studies have been done on semantic errors due to a lack of datasets. To address this issue, we propose a new dataset, called CSEC (Chinese Semantic Error Correction), which includes 17,116 sentences and six primary types of Chinese semantic errors. Moreover, existing methods usually leverage the pronunciation and shape of words to solve Chinese text errors, and this information cannot help solve semantic mistakes. More attention should be paid to the dependency relationship of sentences to solve semantic mistakes. Therefore, we propose a novel method called Desket(Dependency Syntax Knowledge Enhanced Transformer ). The Desket model solves the CSEC task by (1) capturing the semantic information of the sentence along with its dependency syntax, including dependency relationships and part-of-speech information and (2) using dependency syntax to guide the generator to generate the correct output. Experiments on the CSEC dataset demonstrate the superior performance of our framework against existing methods. Our dataset will be released.
Multi-scale Directed Graph Convolution Neural Network for Node Classification Task
ABSTRACT. The existence of problems and objects in the real world which can be naturally modeled by complex graph structure has motivated researchers to combine deep learning techniques with graph theory. Despite the proposal of various spectral-based graph neural networks (GNNs), they still have shortcomings in dealing with directed graph-structured data and aggregating neighborhood information of nodes at larger scales. In this paper, we first improve the Lanczos algorithm by orthogonality checking method and Modify Gram-Schmidt orthogonalization technique. Then, we build a long-scale convolution filter based on the improved Lanczos algorithm and combine it with a short-scale filter based on Chebyshev polynomial truncation to construct a multi-scale directed graph convolution neural network (MSDGCNN) which can aggregate multi-scale neighborhood information of directed graph nodes in larger scales. We validate our improved Lanczos algorithm on the atom classification task of the QM8 quantum chemistry dataset. We also apply the MSDGCNN on various real-world directed graph datasets (including WebKB, Citeseer, Telegram and Cora-ML) for node classification task. The result shows that our improved Lanczos algorithm has much better stability, and the MSDGCNN outperforms other state-of-the-art GNNs on such task of real-world datasets.
Cross-Modal Method based on Self-Attention Neural Networks for Drug-Target Prediction
ABSTRACT. Prediction of drug-target interactions (DTIs) plays a crucial role in drug retargeting, which can save costs and shorten time for drug development. However, existing methods are still unable to integrate the multimodal features of existing DTI datasets. In this work, we propose a new multi-head-based self-attention neural network approach, called SANN-DTI, for dti prediction. Specifically, entity embeddings in the knowledge graph are learned using DistMult, then this information is interacted with traditional drug and protein representations via multi-head self-attention neural networks, and finally DTIs is computed using fully connected neural networks for interaction features. SANN-DTI was evaluated in three scenarios across two baseline datasets. After ten fold cross-validation, our model outperforms the most advanced methods. In addition, SANN-DT has been applied to drug retargeting of breast cancer via HRBB2 targets. It was found that four of the top ten recommended drugs have been supported by the literature. Ligand-target docking results showed that the second-ranked drug in the recommended list had a clear affinity with HRBB2, which provides a promising approach for better understanding drug mode of action and drug repositioning.
UATR: An Uncertainty Aware Two-stage Refinement Model for Targeted Sentiment Analysis
ABSTRACT. Target sentiment analysis aims to predict the fine-grained sentiment polarity of a given term. Although some achievements have been made in recent years, the accuracy of targeted sentiment multi-classification technology is still insufficient—a considerable proportion of samples are incorrectly predicted as the opposite polarity. To this end, we investigate the effectiveness of utilizing model uncertainty and propose a two-stage refinement predicting model based on uncertainty called UATR. UATR can model uncertainty by inferring the distribution of model weights and is more robust to small data learning. Experiments on standard benchmark SemEval14 show that our model can not only reduce the proportion of samples incorrectly predicted as the opposite polarity, but also improves accuracy and F1 values by more than 2% and 3% compared to the current state-of-the-art models, respectively.
AttIN: Paying More Attention to Neighborhood Information for Entity Typing in Knowledge Graphs
ABSTRACT. Entity types in knowledge graph (KG) have been employed extensively in downstream tasks of natural language processing (NLP). Currently, knowledge graph entity typing is usually inferred by embeddings, but a single embedding approach ignores interactions between neighbor entities and relations. In this paper, we propose an AttIN model that pays more attention to entity neighborhood information. More specifically, AttIN contains three independent inference modules, including a BERT module that uses the target entity neighbor to infer the entity type individually, a context transformer that aggregates information based on different contributions from the neighbor, and an interaction information aggregator (IIAgg) module that aggregates the entity neighborhood information into a long sequence. In addition, we use exponentially weighted pooling to process these predictions. Experiments on the FB15kET and YAGO43kET datasets show that AttIN outperforms existing competitive baselines while it does not need extra semantic information in the sparse knowledge graph.
Text-based Person Re-ID by Saliency Mask and Dynamic Label Smoothing
ABSTRACT. The current text-based person re-identification (re-ID) models tend to learn salient features of image and text, which however is prone to failure in identifying persons with very similar dress, because their image contents with observable but indescribable difference may have identical textual description. To address this problem, we propose a saliency mask based re-ID model to learn non-salient but highly discriminative features, which can work together with the salient features to provide more robust pedestrian identification. To further improve the performance of the model, a dynamic label smoothing based cross-modal projection matching loss (named CMPM-DS) is proposed to train our model, and our CMPM-DS can adaptively adjust the smoothing degree of the true distribution. We conduct extensive ablation and comparison experiments on two popular re-ID benchmarks to demonstrate the efficiency of our model and loss function, and improving the existing best R@1 by 0.33% on CUHK-PEDE and 4.45% on RSTPReID.
Robust Multi-view Spectral Clustering with Auto-encoder for Preserving Information
ABSTRACT. Multi-view clustering is a prominent research topic in ma-
chine learning that leverages consistent and complementary information
from multiple views to improve clustering performance. The graph-based
multi-view clustering methods learn consistent graph with pairwise sim-
ilarity between data as edges, and generate sample representation using
spectral clustering. However, most existing methods seldom consider to
recreate the input data using encoded representation during representa-
tion learning procedure, which result in information loss. To address this
limitation, we propose a robust multi-view clustering with auto-encoder
for preserving information (RMVSC-AE) that minimizes the reconstruc-
tion error between the input data and the reconstructed representation
to preserve knowledge. Specifically, we discover a graph representation by
jointly optimizing the graph Laplacian and auto-encoder reconstruction
terms. Moreover, we introduce a sparse noisy term to further enhance
the quality of the learned consistent graph. Extensive experiments on six
multi-view datasets are conducted to verify the efficacy of the proposed
method.
Improving Transferbility of Adversarial Attack on Face Recognition with Feature Attention
ABSTRACT. The development of deep convolutional neural networks has greatly improved the face recognition(FR) technique that has been used in many applications, which raise concerns on the fact that DCNNs are vulunerable to adversarial examples. In this work, we explore the robustness of FR models based on the transferbility of adversarial examples. First, we argue that compared with adversarial examples generated by modelling output features of deep face models, the examples generated by modelling internal features have stronger transferbility. After that, we propose to leverage attention of the surrogate model to a pre-determined intermediate layer to seek the key features that different deep face models may share, which avoids overfitting on the surrogate model andnarrows the gap between surrogate model and target model. In addition, in order to further enhance the black-box attack success rate, a multi-layer attack strategy is proposed, which enables the algorithm to generate perturbations guided by features with model’s general interest. Extensive experiments on four deep face models show the effectiveness of our method.
Pre-trained Financial Model for Price Movement Forecasting
ABSTRACT. We propose the Pre-trained Financial Model (PFM) for price movement forecasting, which is critical in the automated trading systems in the Stock and Futures markets. Inspired by recent successes of pre-trained large language models in tackling NLP tasks, our PFM adopts a pretraining-and-finetuning strategy for obtaining capable models that are adapted to various downstream price-forecasting tasks. During the pre-training stage, we train a sequence prediction backbone with multi-task learning by adopting both a supervised learning objective and an unsupervised regularization target. Our approach differs from the common masked language modeling (MLM) used in NLP studies. We develop a per-step target variable generation strategy for eliciting future predictions from the transformer encoder-decoder architecture. We verify our pre-trained model on various practical downstream forecasting tasks, including lagged movement regression, movement direction classification, and selective trading with top-performing stocks. Specifically, during the fine-tuning stage, we retain the pre-trained encoder and replace the decoder with specific downstream task decoders. We then perform supervised task-specific target generation learning as the fine-tuning process. Through extensive numerical studies and analysis, we demonstrate that our fine-tuned financial model can achieve a 5-15% improvement over downstream regression and classification tasks and over 40% in selective trading task.
Diff-Writer: A Diffusion Model Based Stylized Online Handwritten Chinese Character Generator
ABSTRACT. Online handwritten Chinese character generation is an interesting task which has gained more and more attention in recent years. Most of the previous methods are based on autoregressive models, where the trajectory points of characters are generated sequentially. However, this often makes it difficult to capture the global structure of the handwriting data. In this paper, we propose a novel generative model, named Diff-Writer, which can not only generate the specified Chinese characters in a non-autoregressive manner but also imitate the calligraphy style given a few style reference samples. Specifically, Diff-Writer is based on conditional Denoising Diffusion Probabilistic Models (DDPM) and consists of three modules: character embedding dictionary, style encoder, and a LSTM denoiser. The character embedding dictionary and the style encoder are adopted to model the content information and the style information respectively. The denoiser iteratively generates characters using the content and style codes. Extensive experiments on a popular dataset (CASIA-OLHWDB) shows that our model is capable of generating highly realistic and stylized Chinese characters.
Learning Discriminative Semantic and Multi-View Context for Domain Adaptive Few-shot Relation Extraction
ABSTRACT. Few-shot relation extraction enables the model to extract new relations and achieve impressive success. However, when new relations come from new domains, semantic and syntactic differences cause a dramatic drop in model performance. Therefore, the domain adaptive few-shot relation extraction task becomes important. However, existing works identify relations more by entities than by context, which makes it difficult to effectively distinguish different relations with similar entity semantic backgrounds in professional domains. In this paper, we propose a method called multi-view context representation with discriminative semantic learning (MCDS). This method learns discriminative entity representations and enhances the use of relational information in context thus effectively distinguishing different relations with similar entity semantics. Meanwhile, it filters partial entity information from the global information through an information filtering mechanism to obtain more comprehensive global information. We perform extensive experiments on the FewRel 2.0 dataset and the results show an average gain of 2.43% in the accuracy of our model on all strong baselines. We will release our code to facilitate future research.
Impulsion of Movie's Content-Based Factors in Multi-Modal Movie Recommendation System
ABSTRACT. Nowadays the Recommendation System, a subclass of information filtering system does not require any introduction, and the movie recommendation system plays an important role in the streaming platform where a huge number of movies are required to be analyzed before showcasing a perfectly matched subset of them to its users.
The existing works in this domain focus only on the output and consider the model's input similar for all users. But actually, the movie embedding input vector varies on a user basis. A user's perception of a movie depends on the movie's genre as well as its meta information ( like story, director, and cast). To formulate the fact, we have introduced two scores, (i) TextLike-score and (ii) GenreLike-score. Our proposed Cross-Attention-based Model outperforms the SOTA (state-of-the-art) by leveraging the effect of the scores and satisfying our factual notion.
In this paper, we have evaluated our model's performance over two different datasets, (i) MovieLens-100K(ML-100K) and (ii) MFVCD-7K. Regarding multi-modality, the audio-video information of movies' are used and textual information has been employed for score calculation. Finally, it is experimentally proved that the Cross-Attention-based multi-modal movie recommendation system with the proposed Meta-score successfully covers all the analytical queries supporting the purpose of the experiment.
Unleash the Capabilities of the Vision-Language Pre-training Model in Gaze Object Prediction
ABSTRACT. In a retail environment, it is valuable to evaluate the products of interest to perform accurate recommendations. However, the existing method (gaze following) only predicts the gaze area, and the prediction problem of gaze objects has not been fully explored. To this end, this paper proposes a new visual language model based on pre-trained large language models for the gaze object prediction framework, named EdgeCLIP. Primarily, we employ a set of adaptable and instructive cues to judiciously infuse instructional cues into the extensive language model, while proficiently retaining its pre-training knowledge. Secondly, we introduce a multi-head pooled attention block, MPATB, to achieve semantic enhancement and extract the joint representation of multimodal components, thereby mitigating the discrepancy in fixation points and subsequently reducing inaccurate predictions of gaze objects. Furthermore, we introduce a regulatory loss function that effectively governs the gaze heatmap within the stared box. A large number of experiments have proved that our model outperforms previous models. The code will be available in: https://github.com/fadaishaitaiyang/EdgeCLIP.
ABSTRACT. As a class of nonlinear subspace clustering methods, kernel subspace clustering has shown promising performance in many applications. This paper focuses on the kernel selection problem in the kernel subspace clustering model. Currently, the kernel function is typically chosen by the single kernel or multiple kernel methods. The former relies on a given kernel function, which poses challenges in clustering tasks with limited prior information, making it difficult to determine a suitable kernel function beforehand. Multiple kernel methods usually assume that the optimal kernel is near a series of predefined base kernels, which limits the expressive ability of the optimal kernel. Furthermore, multiple kernel methods tend to have higher solution complexity than single kernel methods.
To address these limitations, this paper utilizes contrastive learning to learn the optimal kernel adaptively and proposes the Contrastive Kernel Subspace Clustering (CKSC) method. Unlike multiple kernel approaches, CKSC is not constrained by the multiple kernel assumption. Specifically, CKSC integrates a contrastive regularization into the kernel subspace clustering model, encouraging neighboring samples in the original space to stay nearby in the reproducing kernel Hilbert space (RKHS). In this way, the resulting kernel mapping can preserve the cluster structure of the data, which will benefit downstream clustering tasks. The clustering experiments on seven benchmark data sets validate the effectiveness of the proposed CKSC method.
A frequency reconfigurable multi-mode printed antenna
ABSTRACT. A multi-frequency reconfigurable antenna is proposed. The designed antenna can be electronically tuned to achieve the tuning operation in the 2.4GHz band defined by the IEEE 802.11b standard and the ultra-wideband (UWB) low frequency. An L-shaped branch and a polygon patch are used as the main radiators of the antenna. Two varactor diodes are mounted on the slots and one PIN diode is mounted on the L-shaped branch to vary the effective electrical length of the antenna. The simulation and measurement results match well. With high frequency reconfiguration stability under guaranteed miniaturization, good impedance matching (S11>-10dB) is obtained in several operating bands, and the overall impedance bandwidth covers 2.34-2.58GHz and 3.11-5.14GHz. It provides solutions for operation within WiMax, WLAN, and 5G-sub6GHz.
ABSTRACT. With the explosive growth of Internet data, it has become quite easy to collect unlabeled training data in many practical machine learning and its applications, but it is relatively difficult to obtain a large amount of labeled training data. Therefore, small data learning has attracted wide attention. Based on the idea of semi-supervised learning, we propose the mutually guided dendritic neural models (MGDNM) framework, which can realize the expansion of labeled datasets. MGDNM utilizes two base classifiers for data exchange, so as to achieving complementary advantages. To simulate the problem of insufficient labeled data, we used 20% of the dataset as the training dataset. On this basis, we conducted experiments on three datasets (Iris, Breast Cancer and Glass). By calculating the accuracy and confusion matrices, the comparison shows that the classification effect of MGDNM is significantly higher than dendritic neural model (DNM), Support Vector Machine (SVM), Gaussian Naive Bayesian (GaussianNB) and Back Propagation Neural Network (BP). It shows that MGDNM framework is effective and feasible.
Adaptive Accelerated Gradient Algorithm for Training Fully Complex-Valued Dendritic Neuron Model
ABSTRACT. This paper presents an adaptive complex-valued Nesterov accelerated gradient (ACNAG) algorithm for the training of fully complex-valued dendritic neural model (FCVDNM). Firstly, based on the complex-valued Nesterov accelerated gradient (CNAG) algorithm, an adaptive stepsize update method is introduced by using local curvature information. Secondly, the obtained adaptive stepsize is further constrained by scaling the Malitsky-Mishchenko criterion. Experimental results demonstrate the superior convergence and efficiency of the proposed algorithm compared to CNAG for the training of FCVDNM.
Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff
ABSTRACT. The conventional recipe for Automatic Speech Recognition (ASR) models is to 1) train multiple checkpoints on a training set while relying on a validation set to prevent overfitting using early stopping and 2) average several last checkpoints or that of the lowest validation losses to obtain the final model. In this paper, we rethink and update the early stopping and checkpoint averaging from the perspective of the bias-variance tradeoff. Theoretically, the bias and variance represent the fitness and variability of a model and the tradeoff of them determines the overall generalization error. But, it's impractical to evaluate them precisely. As an alternative, we take the training loss and validation loss as proxies of bias and variance and guide the early stopping and checkpoint averaging using their tradeoff, namely an Approximated Bias-Variance Tradeoff (ApproBiVT). When evaluating with advanced ASR models, our recipe provides 2.5%-3.7% and 3.1%-4.6% CER reduction on the AISHELL-1 and AISHELL-2, respectively.
Traffic Signal Control Optimization Based on Deep Reinforcement Learning With Attention Mechanisms
ABSTRACT. Deep reinforcement learning (DRL) methodology with traffic control systems plays a vital role in adaptive traffic signal controls. However, previous studies have frequently disregarded the significance of vehicles near to intersections, which typically involve higher decision-making requirements and safety considerations. To overcome this challenge, this paper presents a novel DRL-based traffic signal control method that incorporates attention mechanism (AM). Using the Dueling Double Deep Q Network (D3QN), this approach emphasizes the priority of vehicles near intersections by assigning them higher weights and more attention. Moreover, the state design includes the incorporation of signal light statuses to facilitate a more comprehensive understanding of the current traffic environment. Furthermore, the performance of the model is enhanced through the utilization of Double DQN and Dueling DQN techniques. The experimental findings demonstrate the superior efficacy of the proposed method in key metrics such as vehicle waiting time, queue length, and the number of halted vehicles when compared to D3QN, traditional DQN, and fixed timing strategies.
Task Scheduling With Improved Particle Swarm Optimization In Cloud Data Center
ABSTRACT. This paper proposes an improved particle swarm optimization algorithm with simulated annealing (IPSO-SA) for the task scheduling problem of cloud data center. The algorithm uses Tent chaotic mapping to make the initial population more evenly distributed. Secondly, nonlinear adaptive inertia weights is incorporated to adjust optimization seeking capabilities of particles in different iteration periods. Finally, the Metropolis criterion in SA is used to generate perturbed particles, combined with an modified equation for updating particles to avoid premature particle convergence. Comparative experimental results show that the IPSO-SA algorithm improves 13.8% in convergence accuracy over the standard PSO algorithm. The respective improvements over the other two modified PSO are 15.2% and 9.1%.
Neighborhood Learning for Artificial Bee Colony Algorithm: A Mini-survey
ABSTRACT. Artificial bee colony (ABC) algorithm is a representative paradigm of swarm intelligence optimization (SIO) algorithms, which has received much attention in the field of global optimization for its good performance yet simple structure. However, there still exists a drawback for ABC that it owns strong exploration but weak exploitation, resulting in slow convergence speed and low convergence accuracy. To solve this drawback, in recent years, the neighborhood learning mechanism has been emerged as an effective method and becoming a hot research topic in the community of ABC. However, there has been no surveys on it, even a short one. Considering the appeal of the neighborhood learning mechanism, we are motivated to provide a mini-survey to highlight some key aspects about it, including 1) how to construct a neighborhood topology? 2) how to select the learning exemplar? and 3) what are the advantages and disadvantages? In this mini-survey, some related neighborhood-based ABC variants are reviewed to reveal the key aspects. Furthermore, some interesting future research directions are also given to encourage deeper related works.
Two-Stage Attention Model to Solve Large-Scale Traveling Salesman Problems
ABSTRACT. The Traveling Salesman Problem (TSP) widely exists in real-world scenarios. Existing methods, such as exact, heuristic, and DL-based methods, face difficulties in solving large-scale TSPs within a short time. This paper proposes a two-stage attention model (TSAM) to address large-scale TSPs. Firstly, we design two distinct models: one aims to solve small-scale TSPs, while the other is designed to solve open-loop TSPs. Secondly, an efficient approach is developed to construct complete solutions for large-scale TSPs using the trained models. Finally, an improving strategy is employed to enhance the quality of solutions. Experimental results demonstrate that TSAM can rapidly achieve promising solutions for TSP instances ranging from 500 to 10,000 nodes.
High-dimensional multi-objective PSO based on radial projection
ABSTRACT. When solving multi-objective problems, traditional methods face increased complexity and convergence difficulties because of the increasing number of objectives. This paper proposes a high-dimensional multi-objective particle swarm algorithm that utilizes radial projection to reduce the dimensionality of high-dimensional particles. Firstly, the solution vector space coordinates undergo normalization. Subsequently, the high-dimensional solution space is projected onto 2-dimensional radial space, aiming to reduce computational complexity. Following this, grid partitioning is employed to enhance the ef-ficiency and effectiveness of optimization algorithms. Lastly, the iterative solution is achieved by utilizing the particle swarm optimization algorithm. In the process of iteratively updating particle solutions, the offspring reuse-based parents selection strategy and the maximum fitness-based elimination selection strategy are used to strengthen the diversity of the population, thereby enhancing the search ability of the particles. The computational ex-pense is significantly diminished by projecting the solution onto 2-dimensional radial space that exhibits comparable characteristics to the high-dimensional solution, while simultaneously maintaining the distribution and crowding conditions of the complete point set. In addition, the offspring re-use-based parents selection strategy is used to update the external archive set, further avoiding premature convergence to local optimal solution. The experimental results verify the effectiveness of the method in this paper. Compared with four state-of-the-art algorithms, the algorithm proposed in this paper has high search efficiency and fast convergence in solving high-dimensional multi-objective optimization problems, and can also obtain higher quality solutions.
Efficient Spiking Neural Architecture Search with Mixed Neuron Models and Variable Thresholds
ABSTRACT. Spiking neural networks (SNNs) are emerging as an energy-efficient alternative to artificial neural networks (ANNs) due to their event-driven computation and ability to process temporal information effectively. While Neural Architecture Search (NAS) has been extensively used to optimize neural network structures, its application to SNNs remains limited. Existing studies often overlook the temporal differences in information propagation between ANNs and SNNs. Instead, they focus on shared structures such as convolutional, recurrent, or pooling modules. This paper introduces a novel neural architecture search framework, MixedSNN, explicitly designed for SNNs. Inspired by the human brain, MixedSNN incorporates a novel search space called SSP, which explores the impact of utilizing Mixed spiking neurons and Variable thresholds on SNN performance. Additionally, we propose a training-free evaluation strategy called Period-Based Spike Evaluation (PBSE) that leverages spike activation patterns to consider the temporal features in SNNs. The performance of SNN architectures obtained through MixedSNN is evaluated on three datasets, including CIFAR10, CIFAR100, and CIFAR10-DVS. Results demonstrate that MixedSNN can achieve state-of-the-art performance with significantly lower timesteps.
An Interactive Evolutionary Algorithm for Ceramic Formula Design
ABSTRACT. The ceramic industry is a representative traditional indus- try in Guangdong Province, where its degree of informatization is low, and the design of ceramic formula mainly depends on human experience. To intelligently generate ceramic formulas, two main challenges are raised, i.e., the evaluation of a ceramic formula by actual firing is expensive, and the historical accumulated actual data are limited. To solve this problem, this paper models the ceramic formula design process as an expensive constrained multi-objective optimization problem. Based on the mathematical model, we propose an interactive hybrid metaheuristic evolutionary algorithm, cEDA to optimize the production cost and meanwhile satisfy the category constraints, chemical component constraints and material constraints. It consists of three key components, nondominated sorting, materials selection and proportion allocation to search for qualified ceramic formulas. To incorporate domain expertise, a classification-based interactive optimization method is introduced in cEDA. After two rounds of interaction, the acceptance rate of the generated formulas by the algorithm has increased from 18% to 87.5%, which demonstrates the effectiveness of the proposed algorithm.
Accelerated Genetic Algorithm with Population Control for Energy-Aware Virtual Machine Placement in Data Centers
ABSTRACT. Energy efficiency is crucial for the operation and management of cloud data centers, which are the foundation of cloud computing. Virtual machine (VM) placement plays a vital role in improving energy efficiency in data centers. The genetic algorithm (GA) has been extensively studied for solving the VM placement problem due to its ability to provide high-quality solutions. However, GA’s high computational demands limit further improvement in energy efficiency, where a fast and lightweight solution is required. This paper presents an adaptive population control scheme that enhances gene diversity through population control, adaptive mutation rate, and accelerated termination. Experimental results show that our scheme achieves a 17% faster acceleration and 49% fewer generations compared to the standard GA for energy-efficient VM placement in large-scale data centers.
An Improved YOLOv5s for Glass Tube Defect Detection
ABSTRACT. Existing algorithms for detecting glass tube defects suffer from low recognition accuracy and huge scales. This paper proposes an improved YOLOv5s to solve these problems. Specifically, the Convolutional Block Attention Module (CBAM) is used in the improved YOLOv5s to enhance the feature extraction. Then, the Content-Aware ReAssembly of FEatures (CARAFE) is used to replace the original upsampling method, which is beneficial for feature fusion. Next, the Efficient Intersection over Union (EIoU) loss is substitute to the original Complete Intersection over Union (CIoU) loss of YOLOv5s, which improves the regression accuracy. Finally, we adopt Cosine Warmup to accelerate the convergence as well as improve the detection performance. The experimental results show that, compared with the original YOLOv5s, our improved YOLOv5s increases the mean Average Precision (mAP) by 8% while decreasing the amount of model parameters by 5.8%. Moreover, the improved detector reaches 98 Frames Per Second (FPS) that meets the requirement of real-time detection.
ABSTRACT. Palm oil is an edible vegetable oil that can be used in a wide range of prod-ucts across different industries ranging from food and beverages, personal care and cosmetics, animal feed, industrial products, to biofuel. The palm oil industry contributes slightly less than 4% of Malaysia's overall GDP, making it the country's second-largest producer and exporter of palm oil worldwide. In Malaysia, it has been estimated that there are around 500,000 plantation workers in palm oil industries. In addition to getting a sufficient and steady supply of such usually low skilled workers, there are also issues related to the limits of the human body in performing tough physical work. As a result, UAVs may be utilized to support some of the processes in the palm oil businesses. However, the power of the batteries used in these UAVs is finite before they need to be recharged. Hence, the flight path for the UAV should be optimally computed for it to be able to cover the area it is assigned. In this paper, an improved Non-Dominated Sorting Genetic Algo-rithm II (NSGA-II) was developed to compute the optimal flight path of UAVs which also includes the turning angle and elevation. Enhancements to the algorithm is done by improving the selection, crossover, and mutation operations of the genetic algorithm which helps to improve the convergence and diversity of the algorithm beside avoiding getting trapped in local opti-mal solutions. In the majority of the tests, the improved NSGA-II was able to generate paths that are better than those identified by the human expert. Moreover, the proposed improved NSGA-II algorithm was able to compute good paths in less than the threshold of 10 minutes.
Evolutionary Computation for Berth Allocation Problems: A Survey
ABSTRACT. Berth allocation problem (BAP) refers to assigning berthing spaces to incom-ing vessels while considering various constraints and objectives, which is an important optimization problem in port logistics. Evolutionary computation (EC) algorithms are a class of meta-heuristic optimization algorithms that mimic the process of natural selection to generate and evolve potential solu-tions to optimization problems. Due to its advantages of great search capability and robustness to uncertainty, the EC algorithm has gained significant atten-tion in many research fields. In recent years, multiple studies have successfully applied EC algorithms in solving BAPs and achieved encouraging perfor-mance. This paper aims to survey the existing literature on the EC algorithm for solving BAPs. First, this survey introduces two common models of BAPs, which are continuous BAPs and discrete BAPs. Second, we introduce three typical EC algorithms (including genetic algorithm, particle swarm optimiza-tion, and ant colony optimization) and analyze the existing studies of using these EC algorithms to solve BAPs. Finally, we analyze the future research di-rections of the EC algorithm in solving BAPs.
An Adaptive Auxiliary Training Method of Autoencoders and its Application in Anomaly Detection
ABSTRACT. In various applications of autoencoders, an auxiliary subnetwork is used to improve the performance of a neural network with an autoencoder as the key component. For the specific task of anomaly detection, we have observed that in certain cases, when the reconstruction performance reaches a high level, the auxiliary subnetwork becomes ineffective in further improving the autoencoder's performance. This phenomenon results in oscillation and degradation of the overall system. To address this issue, we propose an adaptive auxiliary training method (AAT) that ensures continuous improvement in the autoencoder's reconstruction performance throughout the entire training procedure. AAT enhances the monitoring of the autoencoder's training, enabling adaptive adjustment of the training strategy without a validation set. Additionally, an anomaly detection scheme is devised based on the proposed adaptive auxiliary training method. Experimental results on multiple datasets prove that the proposed methods produce autoencoders with better reconstruction and detection performances comparing to the state-of-the-art (SOTA) methods.
Event-Triggered Constrained $H_\infty$ Control Using Concurrent Learning and ADP
ABSTRACT. In this paper, an optimal control algorithm based on concurrent learning and adaptive dynamic programming for event-triggered constrained $H_\infty$ control is developed.
First, the $H_\infty$ control system under consideration is based on event-triggered constrained input and time-triggered external disturbance, which saves resources and reduces the network bandwidth burden.
Second, in the implementation of the control scheme, a critic neural network is designed to approximate unknown value function.
Moreover, concurrent learning techniques participate in weight training, making the implementation process simple and effective.
Lastly, the stability of the system and the effectiveness of the algorithm are demonstrated through theorem proofs and simulation results.
ABSTRACT. Deep Neural Networks (DNNs) suffers severe performance degradation when encountering domain shift. Previous methods mainly focus on feature manipulation on source domains to learn transferable features to unseen domains. We propose a new idea based on attention mechanism, which can make model learn most transferable features on source domains and adaptively focus on most discriminative features on unseen domains. To achieve this goal, we design a domain-specific attention module to find the most transferable features in each domain. Different from channel attention, spatial information is also encoded in our module to capture global structure information of samples, which is vital for generalization performance. To reduce parameter overhead, we also introduce knowledge distillation formula to train a lightweight model that has the same attention ability as original model. So we align the attention weights of student model with the specific ones of teacher that corresponding to the domain of input. Results shows that the distilled model even performs better than its teacher and achieve the state-of-the-art performance on several public datasets, e.g. PAC, OfficeHome and VLCS.
AM-RRT*: An Automatic Robot Motion Planning Algorithm based on RRT
ABSTRACT. Motion planning is a very important part of robot technology, where the quality of planning directly affects the energy consumption and safety of robots. Focusing on the shortcomings of traditional RRT methods such as long, unsmooth paths, and uncoupling with robot control system, an automatic robot motion planning method was proposed based on Rapid Exploring Random Tree (AM-RRT*). First, the RRT algorithm was improved by increasing the attractive potential fields of the target points of the environment, making it more directional during the sampling process. Then, a path optimization method based on a dynamic model and cubic B-spline curve was designed to make the planned path coupling with the robot controller. Finally, an RRT speed planning algorithm was added to the planned path to avoid dynamic obstacles in real time. To verify the feasibility of AM-RRT*, a detailed comparison was made between AM-RRT* and the traditional RRT series algorithms. The results showed that AM-RRT* improved the shortcomings of RRT and made it more suitable for robot motion planning in a dynamic environment. The proposal of AM-RRT* can provide a new idea for robots to replace human labor in complex environments such as underwater, nuclear power, and mines.
Human-guided Transfer Learning for Autonomous Robot
ABSTRACT. In recent years neural networks have been successfully applied to many problems. However, prohibitively long learning time and vast training data are sometimes unavoidable. The learning time is crucial for real-time learning of autonomous robots in physical environments. One way to alleviate this problem is through transfer learning, which applies knowledge from one domain to another. In this study, we propose a method for transferring human common sense for guiding the subsequent reinforcement learning of a robot for real-time learning in a more complex environment. The efficacy of the transfer mechanism is analyzed to obtain new insights on the required prior knowledge to be transferred for training an autonomous robot. Different types of prior knowledge were analyzed and explained in this paper.
Computer Simulations of Applying Zhang Inequation Equivalency and Solver of Neurodynamics to Redundant Manipulators at Acceleration Level
ABSTRACT. An equation can be transformed into an equivalent equation at a different level, which is termed equation equivalence or even generalized to be equation equivalency. In recent years, Zhang equivalency, more specifically, Zhang equation equivalency, i.e., a new equation equivalency originated from Zhang neurodynamics, has been proposed and investigated. Referring to Zhang equivalency and doing careful investigation, we similarly find that an inequation can also be transformed into an equivalent inequation at a different level. The novel inequation equivalency named Zhang inequation equivalency (ZIE) is investigated in this paper. Then, ZIE is applied to acceleration-level redundant manipulator motion control. The configuration adjustment and cyclic motion generation of two types of redundant manipulators are investigated and simulated. Comparative experimental results verify the validity of the proposed ZIE. In fact, ZIE can also be applied in different actual projects according to practical requirements.
DOS Dataset: A Novel Indoor Deformable Object Segmentation Dataset for Sweeping Robots
ABSTRACT. Path planning for sweeping robots requires avoiding specific obstacles, particularly deformable objects such as socks, ropes, faeces, and plastic bags. These objects can cause secondary pollution or hinder the robot's cleaning capabilities. However, there is a lack of specific datasets for deformable obstacles in indoor environments. Existing datasets either focus on outdoor scenes or lack semantic segmentation annotations for deformable objects. In this paper, we introduce the first dataset for
detecting and segmenting deformable objects in indoor sweeping robot scenarios, DOS Dataset. We believe that DOS will catalyze research in semantic segmentation of deformable objects for indoor robot obstacle avoidance applications.
Learning Stable Nonlinear Dynamical System from One Demonstration
ABSTRACT. Dynamic systems (DS) methods constitute one of the most commonly employed frameworks for Learning from Demonstration (LfD). The field of LfD aims to enable robots or other agents to learn new skills or behaviors by observing human demonstrations, and DS provide a powerful tool for modeling and reproducing such behaviors. Due to their ability to capture complex and nonlinear patterns of movement, DS have been successfully applied in robotics application. This paper presents a new learning from demonstration method by using the DS. The proposed method ensures that the learned systems achieve global asymptotic stability, a valuable property that guarantees the convergence of the system to an equilibrium point from any initial condition. The original trajectory is initially transformed to a higher-dimensional space and then subjected to diffeomorphism transformation. This transformation maps the transformed trajectory forward to a straight line that converges towards the zero point. By deforming the trajectories in this way, the resulting system ensures global asymptotic stability for all generated trajectories.
CrowdNav-HERO: Pedestrian Trajectory Prediction Based Crowded Navigation with Human-Environment-Robot Ternary Fusion
ABSTRACT. Navigating safely and efficiently in complex and crowded scenarios is a challenging problem of practical significance. A realistic and cluttered environmental layout usually significantly impacts crowd distribution and robotic motion decision-making during crowded navigation. However, previous methods almost either learn and evaluate navigation strategies in unrealistic barrier-free settings or assume that expensive features like pedestrian speed are available. Although accurately measuring pedestrian speed in large-scale scenarios is itself a difficult problem. To fully investigate the impact of static environment layouts on crowded navigation and alleviate the reliance of robots on costly features, we propose a novel crowded navigation framework with Human-Environment-Robot (HERO) ternary fusion named CrowdNav-HERO. Specifically, (i)a simulator that integrates an agent, a variable number of pedestrians, and a series of realistic environments is customized to train and evaluate crowded navigation strategies. (ii) Then, a pedestrian trajectory prediction module is introduced to eliminate the dependence of navigation strategies on pedestrian speed features. (iii) Finally, a novel crowded navigation strategy is designed by combining the pedestrian trajectory predictor and a layout feature extractor. Convincing comparative analysis and sufficient benchmark tests demonstrate the superiority of our approach in terms of success rate, collision rate, and cumulative rewards. The code is published at https://github.com/SiyiLoo/CrowdNav-HERO.
PyraBiNet: A Hybrid Semantic Segmentation Network Combining PVT and BiSeNet for deformable objects in indoor environments
ABSTRACT. In this study, we introduce PyraBiNet, an innovative hybrid model optimized for lightweight semantic segmentation tasks. This model ingeniously merges the merits of Convolutional Neural Networks (CNNs) and Transformers. We propose a dual-branch structure that strategically employs the global feature extraction capabilities of the Pyramidal Vision Transformer (PVT) and the local feature extraction proficiency of BiSeNet. Specifically, the global feature branch employs a transformer from PVT to harness high-level patterns from input images, while the local feature branch utilizes a CNN, inspired by BiSeNet, to extract fine-grained details. Comprehensive evaluations conducted on the ADE20K and DOS datasets underscore PyraBiNet's superior performance compared to the existing state-of-the-art methods. With its effective and efficient performance, PyraBiNet proves to be an invaluable asset in the domain of mobile robotics, particularly beneficial for applications such as sweeping robots.
New Stability Criteria for Markov Jump Systems under DoS Attacks and Packet Loss via Dynamic Event-triggered Control
ABSTRACT. In this paper, the exponential mean square stability of Markov jump systems under packet loss and denial-of-service (DoS) attacks is studied by the dynamic event-triggered control. Different from the existing results, this paper not only considers the impacts of periodic DoS attacks on the system, but also considers random packet loss during the sleeping-period of DoS attacks, which has more practical application value. Firstly, the Bernoulli distribution is used to model the phenomenon of random packet loss, and the zero-input strategy is used to combat the impacts of DoS attacks and random packet loss on the system. Then, based on the dynamic event-triggered mechanism, different controllers are designed during the action-period and sleeping-period of DoS attacks, and a new switched Markov jump system model is obtained using the input delay method. Different from the previous piecewise Lyapunov-Krasovskii functional approach, this paper obtains the less conservative stability criteria by constructing a common Lyapunov-Krasovskii functional. Finally, the authenticity of the proposed method is illustrated through a simulation experiment.
From Incompleteness to Unity: A Framework for Multi-view Clustering with Missing Values
ABSTRACT. The assumption of data completeness plays a significant role in the effectiveness of current Multi-view Clustering (MVC) methods. However, data collection and transmission would unavoidably breach this assumption, resulting in the Partially Data-missing Problem (PDP). A common remedy is to first impute missing values and then conduct MVC methods, which may cause performance degeneration due to inaccurate imputation. To address these issues in PDP, we propose a novel imputation-free framework with matrix correction techniques and adopt a new two-stage strategy, i.e., correction-clustering. Specifically, we first correct affinity matrices estimated from incomplete data to improved estimates with a theoretical guarantee, and then combine them with affinity-based MVC methods to perform clustering, which naturally avoids the uncertainty error from inaccurate imputation and benefits clustering tasks. Extensive experiments demonstrate that our strategy achieves superior and robust clustering performance under a wide range of missing ratios, compared to classical imputation-based approaches.
Partial Multi-label Learning via Constraint Clustering
ABSTRACT. Multi-label learning (MLL) refers to a learning task where
each instance is associated with a set of labels. However, in most real-
world applications, the labeling process is very expensive and time consuming. Partially multi-label learning (PML) refers to MLL where only
a part of the labels are correctly annotated, and the rest are false positive labels. The main purpose of PML is to learn and predict unseen
multi-label data with less annotation cost. To address the ambiguities
in the label set, existing popular PML research attempts to extract the
label confidence for each candidate label. These methods mainly per-
form disambiguation by considering the correlation among labels or/and
features. However, in PML because of noisy labels, the true correlation
among labels is corrupted. These methods can be easily misled by noisy
false-positive labels. In this paper, we propose Partial Multi-Label learning method via Constraint Clustering (PML-CC ) to address PML based
on the underlying structure of data. PML-CC gradually extracts high-
confidence labels and then uses them to extract the rest labels. To find the
high-confidence labels, it solves PML as a clustering task while considering extracted information from previous steps as constraints. In each
step, PML-CC updates the extracted labels and uses them to extract
the other labels. Experimental results show that our method success-
fully tackles PML tasks and outperforms the state-of-the-art methods
on artificial and real-world datasets.
Attention Based Spatial-Temporal Dynamic Interact Network for Traffic Flow Forecasting
ABSTRACT. The prediction of spatio-temporal traffic flow data is challenging due to the complex dynamics among different roads. Existing approaches often focus on capturing traffic patterns at a single temporal granularity, disregarding spatio-temporal interactions and relying heavily on prior knowledge. However, this limits the generality of the models and their ability to adapt to dynamic changes in traffic patterns. We argue that traffic flow changes co-occur in the road network's temporal and spatial dimensions, which leads to commonalities and regularities in the data across these dimensions, with their dynamic changes depending on the temporal granularity. In this research, we propose an attention based spatio-temporal dynamic interaction network consisting of a spatio-temporal interaction filtering module and a spatio-temporal dynamic perception module. The interaction filtering module captures commonalities and regularities from a global perspective, ensuring adherence to the temporal and spatial dimensions of the road network structure. The dynamic perception module incorporates a sliding window attention mechanism to capture local dynamic correlations between the temporal and spatial dimensions at different time granularities. To address the issue of time series span, we design a more adaptive time-aware attention mechanism that effectively captures the impact of time intervals. Extensive experiments on four real-world datasets demonstrate that our approach achieves state-of-the-art performance and consistently outperforms other baseline methods. The source code is available at https://github.com/JunweiXie/ASTDIN.
Towards High-Performance Exploratory Data Analysis (EDA) via Stable Equilibrium Point
ABSTRACT. Exploratory data analysis (EDA) is a vital procedure in data science projects. In this work, we introduce a stable equilibrium point (SEP)-based framework for improving the performance of EDA. By exploiting the SEPs to be the representative points, our approach aims to generate high-quality clustering and data visualization for real-world
data sets. A very unique property of the proposed method is that the SEPs will directly encode the clustering properties of data sets. Compared with prior state-of-the-art clustering and data visualization methods, the proposed methods allow substantially improving solution quality for large-scale data analysis tasks. For instance, for the USPS data set, our method achieves more than 10% clustering accuracy gain over the
standard spectral clustering algorithm and 3X speedup for the t-SNE visualization.
Time Series Anomaly Detection with a Transformer Residual Autoencoder-Decoder
ABSTRACT. Time series anomaly detection is of great importance in a variety of domains such as finance fraud, industrial production, and information systems. However, due to the complexity and multiple periodicity of time series, extracting global and local information from different perspectives remains a challenge. In this paper, we propose a novel Transformer Residual Autoencoder-Decoder Model called TRAD for time series anomaly detection, which is based on a multi-interval sampling strategy incorporating with residual learning and stacked autoencoder-decoder to promote the ability to learn global and local information. Prediction error is applied to calculate anomaly scores using the proposed model from different scales, and the aggregated anomaly scores are utilized to infer outliers of the time series. Extensive experiments are conducted on five datasets and the results demonstrate that the proposed model outperforms the previous state-of-the-art baselines.
Multi-Scale Multi-Step Dependency Graph Neural Network for Multivariate Time Series Forecasting
ABSTRACT. This study addressed the limitations of existing graph neural network methods in time-series prediction, specifically the inability to establish strong dependencies between variables and the weak correlation in time-series across different time scales. To overcome these challenges, we proposed a graph neural network-based multi-scale multistep dependency (GMSSD) model. To capture temporal dependencies in time-series data, we first designed a temporal convolution module that learns multi-scale representations between sequences. We extracted features at multiple scales using dilated convolutions and a gated linear unit (GLU) while controlling the information flow, thereby capturing temporal dependencies in time-series data. Furthermore, we employed a gated recurrent unit (GRU) and fully connected layers to derive the graph structure and capture the complex relationships between variables in the sequence data. In particular, existing graph neural network methods have a strong dependence on graph structures and are unable to adapt to complex and dynamic graph structures. They also have limitations in capturing long-range dependency relationships within the graph. Therefore, a graph convolution module is designed to explore the current node information and its neighbor information. It has the capability to integrate information contributions from different time steps, effectively capturing the spatial dependencies among nodes. The experimental results show that the proposed model outperformed existing methods in both single-step and multi-step prediction tasks. This study provided a novel approach for time-series forecasting and achieved significant improvements.
Time-Series Forecasting through Contrastive Learning with a Two-Dimensional Self-Attention Mechanism
ABSTRACT. Contrastive learning methods have impressive capabilities in time-series representation; however, challenges in capturing contextual consistency and extracting features that meet the requirements of representation learning remain. To address these problems, this study proposed a time-series prediction contrastive learning model based on a two-dimensional self-attention mechanism. The main innovations of this model were as follows: First, long short-term memory (LSTM) adaptive pruning was used to form two subsequences with overlapping parts to provide robust context representation for each timestamp. Second, the model extracted sequence data features in both global and local dimensions. In the channel dimension, the model encoded sequence data using a combination of a self-attention mechanism and dilated convolution to extract key features for capturing long-term trends and periodic changes in data. In the spatial dimension, the model adopted a sliding-window self-attention mechanism to encode sequence data, thereby improving its perceptual ability for local features. Finally, the model introduced a self-correlation attention mechanism that converted the similarity calculation from the real domain to the frequency domain through a Fourier transform, better capturing the periodicity and trends in the data. The experimental results showed that the proposed model outperformed existing models in multiple time-series prediction tasks, demonstrating its effectiveness and feasibility in time-series prediction tasks.
Ignored Details in Eyes: Exposing GAN-generated Faces by Sclera
ABSTRACT. Advances in Generative adversarial networks (GAN) have significantly improved the quality of synthetic facial images, posing threats to many vital areas. Thus, identifying whether a presented facial image is synthesized is of forensic importance. Our fundamental discovery is the lack of capillaries in the sclera of the GAN-generated faces. This deficiency is caused by the lack of physical/physiological constraints in the GAN model. Because there are more or fewer capillaries in people’s eyes, one can distinguish real faces from GAN-generated ones by carefully examining the sclera area. Following this idea, we first extract the sclera area from a probe image, then feed it into a residual attention network to distinguish GAN-generated faces from real ones. The proposed method is validated on the Flickr-Faces-HQ and StyleGAN2/StyleGAN3-generated face datasets. Experiments demonstrate that the capillary in the sclera is a very effective feature for identifying GAN-generated faces.
HANCaps: A Two-Channel Deep Learning Framework for Fake News Detection in Thai
ABSTRACT. The rapid advancement of internet technology, widespread smartphone usage, and the rise of social media platforms have drastically transformed the global communication landscape. These developments have resulted in both positive and negative consequences. On one hand, they have facilitated the dissemination of information, connecting individuals across vast distances and fostering diverse perspectives. On the other hand, the ease of access to online platforms has led to the proliferation of misinformation, often in the form of fake news. Detecting and combatting fake news has become crucial to mitigate its adverse effects on society. This paper presents an investigation into fake news detection in the Thai language, addressing the current limitations in this domain by proposing a novel two-channel deep learning model, called HANCaps, which integrates BERT and FastText embeddings with a hierarchical attention network and capsule network. The HANCaps model utilizes the BERT language model as one channel input, while the other channel incorporates pre-trained FastText embeddings. The proposed model is evaluated using a benchmark Thai fake news dataset, and extensive experiments demonstrate that HANCaps surpasses state-of-the-art methods by up to 3.28\% in terms of F1 score, showcasing its superior performance.
Solving the inverse problem of laser with complex-valued field by physics-informed neural networks
ABSTRACT. In the resonator of an actual laser oscillator, the complex-valued laser field is extracted from the gain. The inverse problem of the laser is to construct the gain utilizing the given complex-valued laser field, which is essential for the design purpose. However, it is a challenge for conventional numerical methods because the governing equations cannot be solved inversely. In this paper, a deep learning method based on physics-informed neural networks is introduced to solve the inverse laser problem. The complex-valued laser field and partial differential equation are divided into real and imaginary parts because the optimizer of neural networks cannot deal with the derivation of complex values. A given paraxial wave equation is used as an example to validate the performance of the method. The comparison between the predictions of PINNs and fast Fourier transform numerical solutions shows the accuracy of gain is 93.22%. This method can be generalized to laser design and optimal problems.
Solving Localized Wave Solutions of the Nonlinear PDEs using Physics-Constraint Deep Learning Method
ABSTRACT. In recent years, with the rapid development of deep learning, its related technologies have started to be applied in the field of scientific computing. As a typical representative, Physics-Informed Neural Networks (PINNs) has been successfully applied to solve partial differential equations (PDEs) and multiphysics simulations, demonstrating tremendous potential and attracting a high level of attention from researchers. In the field of nonlinear science, localized waves have significant research value, and their theories have been applied in many fields. PDEs are an important tool for localized waves of nonlinear systems, and numerical methods for PDEs are widely used in numerical simulations of localized waves. However, traditional numerical methods are often computation-intensive and time-consuming. In this paper, we applied improved PINNs to solve localized wave solutions of PDEs. The improved PINNs not only embed the constraints of PDEs but also add constraints on gradient information, further enriching the physical constraints of the neural network model. Additionally, we adopted an adaptive learning method to update the weight coefficients of the loss function and dynamically adjust the proportions of each constraint term in the entire loss function to speed up the training process. In the experimental section, we selected high-order KdV equation, Boussinesq equation, and nonlinear Schr\"{o}dinger equation (NLSE) for the study and evaluated the accuracy of localized wave simulation results through error analysis. The experimental results indicate that the improved PINNs are significantly better than traditional PINNs, with shorter training time and more accurate prediction results.
Measuring Cognitive Load: Leveraging fNIRS and Machine Learning for Classification of Workload Levels
ABSTRACT. Measuring cognitive load, a subjective construct that reflects the mental ef-fort required for a given task, remains a challenging endeavor. While Func-tional Near-Infrared Spectroscopy (fNIRS) has been utilized in the field of neurology to assess cognitive load, there are limited studies that have specifi-cally focused on high cognitive load scenarios. Previous research in the field of cognitive workload assessment using fNIRS has primarily focused on dif-ferentiating between two levels of mental workload. These studies have ex-plored the classification of low and high levels of cognitive load, or easy and difficult tasks, using various Machine Learning (ML) and Deep Learning (DL) models. However, there is a need to further investigate the detection of multiple levels of cognitive load to provide more fine-grained information about the mental state of an individual. This study aims to classify four mental workload levels using classical ML techniques, specifically random forests, with fNIRS data. It assesses the effectiveness of ML algorithms with fNIRS data, provides insights into classification features and patterns, and contributes to understanding neural mechanisms in cognitive processing. ML algorithms used for classification include Naïve Bayes, k-Nearest Neighbors (k-NN), Decision Trees, Random Forests, and Nearest Centroid. Random Forests achieved a promising accuracy of around 99.99% and an Area Under Curve (AUC) of 0.6668. The findings of this study highlight the potential of utilizing fNIRS and ML algorithms for accurately classifying cognitive work-load levels. The use of multiple features extracted from fNIRS data may contribute to a more robust and reliable classification approach.
Recursive Constrained Maximum Versoria Criterion Algorithm for Adaptive Filtering
ABSTRACT. This paper proposes a recursive constrained maximum Versoria criterion (RCMVC) algorithm. In comparison with recursive competing methods, our proposed RCMVC can achieve smaller steady-state misalignment in non-Gaussian noisy environments. Specifically, we use the maximum Versoria criterion to derive a new robust recursive constrained adaptive filtering within the least-squares framework for solving linearly constrained problems. For RCMVC, we analyze the mean-square stability and characterize the theoretical transient mean square deviation (MSD) performance. Furthermore, we conduct some simulations to validate the consistency between the analytical and simulation results and show the effectiveness of RCMVC in non-Gaussian noisy environments.
Actor-Critic with variable time discretization via sustained actions
ABSTRACT. Reinforcement learning (RL) methods work in discrete time. In order to apply RL to inherently continuous problems like robotic control, a specific time discretization needs to be defined. This is a choice between sparse time control, which may be easier to train, and finer time control, which may allow for better ultimate performance. In this work, we propose SusACER, an off-policy RL algorithm that combines the advantages of different time discretization settings. Initially, it operates with sparse time discretization and gradually switches to a fine one. We analyze the effects of the changing time discretization in robotic control environments: Ant, HalfCheetah, Hopper, and Walker2D. In all cases our proposed algorithm outperforms state of the art.
Two-Stream Spectral-Temporal Denoising Network for End-to-end Robust EEG-based Emotion Recognition
ABSTRACT. Emotion recognition based on electroencephalography (EEG) is attracting more and more interest in affective computing. Previous studies have predominantly relied on manually extracted features from EEG signals. It remains largely unexplored in the utilization of raw EEG signals, which contain more temporal information but present a significant challenge due to their abundance of redundant data and susceptibility to contamination from other physiological signals, such as electrooculography (EOG) and electromyography (EMG). To cope with the high dimensionality and noise interference in end-to-end EEG-based emotion recognition tasks, we introduce a Two-Stream Spectral-Temporal Denois- ing Network (TS-STDN) which takes into account the spectral and temporal aspects of EEG signals. Moreover, two U-net modules are adopted to reconstruct clean EEG signals in both spectral and temporal domains while extracting discriminative features from noisy data for classifying emotions. Extensive experiments are conducted on two public datasets, SEED and SEED-IV, with the original EEG signals and the noisy EEG signals contaminated by EMG signals. Compared to the baselines, our TS-STDN model exhibits a notable improvement in accuracy, demonstrating an increase of 6% and 8% on the clean data and 11% and 10% on the noisy data, which shows the robustness of the model.
Brain-inspired Binaural Sound Source Localization Method Based On Liquid State Machine
ABSTRACT. Binaural sound source localization is a remarkable topic in robot design and human hearing aid. A great amount of algorithms flourished due to a leap in machine learning. However, prior approaches lack the ability to make a trade off between parameter size and accuracy, which is a main obstacle for their further implementation on resource constrained devices. Spiking Neural Network (SNN)-based models have emerged as well owing to its inherent computing superiority of sparse event processing. Liquid State Machine (LSM) is a classic spiking recurrent neural network which has the natural potential of processing spatiotemporal information. LSM has been proved advantageous on numerous tasks once proposed. Yet to our best knowledge, it is the first proposed BSSL model based on LSM, and we name it BSSL-LSM. BSSL-LSM is lightweight with only 1.04M parameters which has a huge reduction compared to CNN (10.1M) and D-BPNN (2.23M) while maintaining comparable or even superior accuracy. Compared to SNN-IID, there is a 10% accuracy improvement for 10° interval localization. To achieve better performance, we introduce Bayesian Optimization for hyper parameters searching and a novel soft label technique for better differentiating adjacent angles, which can be easily mirrored on related works. Project page: https://github.com/BSSL-LSM
How do native and non-native listeners differ? Investigation with dominant frequency bands in auditory evoked potential
ABSTRACT. EEG signal provides valuable insights into cortical responses to specific exogenous stimuli, including auditory and visual stimuli. This study investigates the evoked potential in EEG signals and dominant frequency bands for native and non-native subjects. Songs in different languages were played to subjects using conventional in-ear phones or bone-conducting devices. Time-frequency analysis was performed to characterise induced and evoked responses in the EEG signal, focusing on the phase synchronisation level of the evoked potential as a significant feature. Metrics such as phase locking value (PLV) and weighted phase lag index (WPLI) were used to assess the phase synchrony between the EEG signal and sound signal, while the frequency-dependent effective gain was analysed to understand its impact. The results demonstrated that native subjects experienced higher levels of evoked potential, indicating more complex cognitive neural processes compared to non-native subjects. Dominant frequency windows associated with higher levels of evoked potential were identified using a peak-picking algorithm. Interestingly, the choice of playing device had minimal influence on the evoked potential, suggesting similar outcomes with both in-ear phones and bone-conducting devices. This study provides valuable insights into the neural processing differences between native and non-native subjects and highlights the potential impact of playing devices on the evoked potential.
Cancellable iris recognition scheme based on inversion fusion and local ranking
ABSTRACT. Iris recognition has gained significant attention and application in real-life and financial scenarios in recent years due to its importance as a biometric data source. While many proposed solutions boast high recognition accuracy, one major concern remains the effective protection of users' iris data and prevention of privacy breaches, an aspect often lacking in existing solutions. To address this issue, we propose an improved cancellable biometrics scheme based on the inversion fusion and local ranking strategy (IFCB), specifically targeting the vulnerability of the local ranking-based cancellable biometrics scheme (LRCB) to the ranking-inversion attack when recognition accuracy is high. The proposed method disrupts the original iris data by applying a random substitution string and rearranges blocks within each iris string, either inverting or keeping them unchanged. This combination of inversed and unchanged blocks, referred as inversion fusion, is then sorted to obtain rank values that are stored for subsequent matching. It is important to note that the inversion fusion step may lead to a loss of accuracy, which can be compensated by amplifying the iris data to improve accuracy. By utilizing a set of different random substitution strings, the rearranged iris strings are employed in both the inversion fusion and local ranking steps. A long iris template is generated and stored as the final protected iris template, forming the basis of the proposed IFCB method. Theoretical and experimental analyses demonstrate that the IFCB scheme effectively withstands rank-inversion attacks and achieves a favorable balance of accuracy, irreversibility, unlinkability, and revocability.