ICONIP2023: THE 30TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING
PROGRAM FOR TUESDAY, NOVEMBER 21ST
Days:
next day
all days

View: session overviewtalk overview

08:30-11:50 Session Poster Session 1: Poster Session
Chairs:
Differential fault analysis against AES based on a hybrid fault model

ABSTRACT. In this paper, a differential fault analysis based on a hybrid fault model is proposed. The hybrid fault model is comprising a one-byte and multibyte by injecting faults in the state. Through both theory and simulations, which successfully derived the key of AES-128, 192, and 256 with two, three, and four pairs of faulty ciphertexts (pairs of faulty ciphertext refers to the correct ciphertext and the responding faulty ciphertext after injecting faults) without exhaustive search, respectively. Compared with the latest methods, the method proposed only requires the fault injected in a single round, thus it is easier to carry out to an attacker. When considering AES-192, fewer faulty ciphertexts are needed. In addition, for both AES-192 and 256, our method requires fewer depths of induced fault (the entire key can be retrieved only need to induce fault in the T-2 round). Thus, the DFA proposed in this article is more efficient.

Single Feedback Based Kernel Generalized Maximum Correntropy Adaptive Filtering Algorithm

ABSTRACT. This paper presents a novel single feedback based kernel generalized maximum correntropy (SF-KGMC) algorithm by introducing a single delay into the framework of kernel adaptive filtering. In SF-KGMC, the history information implicitly existing in the single delayed output can enhance the convergence rate. Compared to the second-order statistics criterion, the generalized maximum correntropy (GMC) criterion shows better robustness against outliers. Therefore, SF-KGMC can efficiently reduce the influence of impulsive noise, and avoids significant performance degradation. In addition, for SF-KGMC, the theoretical convergence analysis is also conducted. Simulation results on chaotic time-series prediction and real-world data applications validate that SF-KGMC achieves a better filtering accuracy and a faster convergence rate.

Unsupervised Feature Selection Using Both Similar and Dissimilar Structures

ABSTRACT. Unsupervised feature selection method is widely used to handle the rapid increasing complex and high-dimensional sparse data without labels. Many good methods have been proposed in which the relationships between the similar data points are mainly considered. The graph embedding theory is used which occupies a large proportion. Despite their achievements, the existing methods neglect the information from the most dissimilar data. In this paper, we follow the research line of graph embedding and present a novel method for unsupervised feature selection. Two different viewpoints in the positive and negative are used to keep the data structure after feature selection. Besides a Laplacian matrix by which the most similar data structure is kept, we build an additional Laplacian matrix to keep the least similar data structure. Furthermore, an efficient algorithm is designed by virtue of the existing generalized powered iteration method. Extensive experiments on four benchmark data sets are conducted to verify the state-of-the-art effectiveness and superiority of the proposed method.

Applications of Deep Learning Methods in the Diagnosis of Coronary Heart Disease based on Electronic Health Record

ABSTRACT. With the development of Internet technology, the number of electronic medical record data has surged. Also, artificial intelligence simulates the ability of human beings to solve problems and make decisions through complex and fast calculation of a large amount of data. Using the electronic medical records of hypertension patients in Peking University Shenzhen Hospital, we proposed the use of deep learning models to achieve intelligent coronary heart disease diagnosis for hypertensive patients. We conducted statistical data analysis and effective feature selection. By feature engineering experiments, the disease risk variables needed for modeling were determined. Then, we established a diagnostic model for coronary heart disease in hypertensive patients based on the Transformer model and comparative learning. The model integrates multiple types of medical record data such as patient's personal information, symptoms, concurrent diseases and test data, and the results proved that our model achieved the best classification performance and accuracy compared with CNN, RNN and LSTM, with AUC values for the tasks of coronary heart disease and arrhythmia diagnosis reached 0.9349 and 0.8024 respectively. In the future, this model can be extended to the diagnosis of general chronic diseases.

Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation

ABSTRACT. 3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become a mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-the-art usually exploit bidirectional projection methods to align the cross-dimensional features and realize both 2D & 3D semantic segmentation tasks. However, to enable bidirectional mapping, this framework often requires a symmetrical 2D-3D network structure, thus limiting the network’s flexibility. Meanwhile, such dual-task settings may distract the network easily and lead to over-fitting in the 3D segmentation task. As limited by the network’s inflexibility, fused features can only pass through a decoder network, which affects model performance due to insufficient depth. To alleviate these drawbacks, in this paper, we argue that despite its simplicity, projecting unidirectionally multi-view 2D deep semantic features into the 3D space aligned with 3D deep semantic features could lead to better feature fusion. On the one hand, the unidirectional projection enforces our model focused more on the core task, i.e., 3D segmentation; on the other hand, unlocking the bidirectional to unidirectional projection enables a deeper cross-domain semantic alignment and enjoys the flexibility to fuse better and complicated features from very different spaces. In joint 2D-3D approaches, our proposed method achieves superior performance on the ScanNetv2 benchmark for 3D semantic segmentation.

Embedding Entity and Relation for Knowledge Graph by Probability Directed Graph

ABSTRACT. Knowledge graph embedding(KGE) represents entities and relationship as low dimensional dense vectors in knowledge graphs(KGs), and to improve the computational efficiency of downstream tasks. This paper regards the semantic relations between the head and tail entities in the KGs as semantically probability transfers, and proposes a method based on the structured probability model for KGE. This method considers the relations as nodes and uses probability-directed graphs to model the KGs. The scoring function is defined as a probability distribution that represents the directed transitivity between the entities and relations. Finally, the function is used to infer the probability that the triples are true. Experimental results on several standard datasets show that this method achieves better results in complex relations and uneven distribution of triples.

Channel Attention Separable Convolution Network for Skin Lesion Segmentation

ABSTRACT. Skin cancer is a frequently occurring cancer in the human population, and it is very important to be able to diagnose malignant tumors in the body early. Lesion segmentation is crucial for monitoring the morphological changes of skin lesions, extracting features to localize and identify diseases to assist doctors in early diagnosis. Manual de-segmentation of dermoscopic images is error-prone and time-consuming, thus there is a pressing demand for precise and automated segmentation algorithms. Inspired by advanced mechanisms such as U-Net, DenseNet, Separable Convolution, Channel Attention, and Atrous Spatial Pyramid Pooling (ASPP), we propose a novel network called Channel Attention Separable Convolution Network (CASCN) for skin lesions segmentation. The proposed CASCN is evaluated on the PH2 dataset with limited images. Without excessive pre-/post-processing of images, CASCN achieves state-of-the-art performance on the PH2 dataset with Dice similarity coefficient of 0.9461 and accuracy of 0.9645.

Few-Shot NER in Marine Ecology using Deep Learning

ABSTRACT. In the text of marine ecological environment, marine ecological conditions and marine environmental pollution play an important role in the ocean, which are the basis of the strategy of Marine Potestatem. However, in Named Entity Recognition (NER) of marine ecology, there are some problems such as small amount of domain text, weak semantic features of input vectors, and neglect of local features in traditional methods. In order to complete the challenge of Named Entity Recognition in marine ecology under a small sample, a model with few samples based on deep learning was designed. Firstly, the original text was trained by Sequence Generative Adversarial Nets (SeqGAN) to generate new text, and the original corpus were expanded. Af-ter that, the BERT-IDCNN-BiLSTM-CRF was proposed for marine ecological entity extraction. BERT (Bidirectional Encoder Representation from Trans-formers) was used for pre-training by the expanded corpus. Then the em-beddings trained by BERT were input into Iterative Dilation Convolutional Networks (IDCNN) and Bidirectional Long Short-Term Memory Networks (BiLSTM) for feature extraction. Finally, Conditional Random Fields (CRF) were used to constrain the labeled sequence to get the result. For the pro-posed few-sample Named Entity Recognition method based on deep learn-ing, horizontal and vertical comparative experiments were performed on the original corpus and the expanded corpus. Experimental results show that BERT-IDCNN-BiLSTM-CRF improves the F score by 3.33 percentage points on the original corpus, compared with the BiLSTM-CRF model based on character embedding. On the expanded corpus, the F score of BERT-IDCNN-BiLSTM-CRF is 2.65 percentage points higher than that on the origi-nal corpus. It effectively improved the effect of entity extraction in marine ecology and provided the basis for downstream tasks such as the construc-tion of Knowledge Graph and ecological governance.

Efficient Hierarchical Reinforcement Learning via Mutual Information Constrained Subgoal Discovery

ABSTRACT. Goal-conditioned hierarchical reinforcement learning has demonstrated impressive capabilities in addressing complex and long-horizon tasks. However, the extensive subgoal space often results in low sample efficiency and challenging exploration. To address this issue, we leverage mutual information to measure the distance between subgoals and constrain their generation range to extract informative subgoals. By utilizing historical transitions from the replay buffer during off-policy training, we impose two constraints on the subgoals generation of the high-level policy: these subgoals should be reached with less effort by the low-level policy, and the realization of these subgoals can facilitate achieving the desired goals. These two constraints enable subgoals to act as critical links between the current states and the desired goals, providing more effective guidance to the low-level policy. The empirical results on continuous control tasks demonstrate that our proposed method significantly enhances the training efficiency across diverse environments, regardless of the dimensions of the state and action spaces. Furthermore, our method ensures comparable performance to state-of-the-art methods.

PTCP: Alleviate Layer Collapse in Pruning at Initialization via Parameter Threshold Compensation and Preservation

ABSTRACT. Over-parameterized neural networks have good performance, but training such networks is computationally expensive. Pruning at initialization (PaI) avoids training a full network, which has attracted intense interest. But at high compression ratios, layer collapse severely compromises the performance of PaI. Existing methods introduce operations such as iterative pruning to alleviate layer collapse. However, these operations require additional computing and memory costs. In this paper, we focus on alleviating layer collapse without increasing cost. Therefore, we propose an efficient strategy called parameter threshold compensation. This strategy constrains the lower limit of network layer parameters and uses parameter transfer to compensate for layers with fewer parameters. To promote a more balanced transfer of parameters, we further propose a parameter preservation strategy, using the average number of preserved parameters to more strongly constrain the layers that reduce parameters. We conduct extensive experiments on five pruning methods on Cifar10 and Cifar100 datasets using VGG16 and ResNet18 architectures, verifying the effectiveness of our strategy. Furthermore, we compare the improved performance with two SOTA methods. The comparison results show that our strategy achieves similar performance, challenging the design of increasingly complex pruning strategies.

On efficient federated learning for aerial remote sensing image classification: a filter pruning approach

ABSTRACT. Unmanned aerial vehicle (UAV) is an increasingly popular carrier for aerial remote sensing. Unfortunately, due to the constraints of privacy-preserving and limited communication resources, remote sensing images cannot be freely exchanged among UAV swarms for training an appropriate model for remote sensing image classification. Federated learning (FL) allows UAVs to train deep models with their own captured images while preserving data privacy, thus serving as one important technical block for unleashing the potential of UAV swarms. However, the limited computing and communication resource hinders the deployment of FL on UAVs. In this article, we propose a novel efficient federated learning framework CALIM-FL short for Cross-All-Layers Importance Measure pruning based Federated Learning. In CALIM-FL, an efficient iterative filter pruning mechanism is intertwined with the standard FL procedure. The model size is adapted during FL to reduce both communication and computation overhead at the cost of a slight accuracy loss. The novelties of this work come from the following two aspects: 1) an more accurate importance measure on filters from the perspective of the whole neural networks; 2) a communication-efficient iterative pruning mechanism without requiring the data from the devices. Comprehensive experiments results show that the proposed CALIM-FL is effective in a variety of scenarios, with a resource overhead saving of $88.4$\% at the cost of $1\%$ accuracy loss.

Propheter: Prophetic Teacher Guided Long-Tailed Distribution Learning

ABSTRACT. The problem of deep long-tailed learning, a prevalent challenge in the realm of generic visual recognition, persists in a multitude of real-world applications. To tackle the heavily-skewed dataset issue in long-tailed classification, prior efforts have sought to augment existing deep models with the elaborate class-balancing strategies, such as class rebalancing, data augmentation, and module improvement. Despite the encouraging performance, the limited class knowledge of the tailed classes in the training dataset still bottlenecks the performance of the existing deep models. In this paper, we propose an innovative long-tailed learning paradigm that breaks the bottleneck by guiding the learning of deep networks with external prior knowledge. This is specifically achieved by devising an elaborated ``prophetic'' teacher, termed as ``Propheter'', that aims to learn the potential class distributions. The target long-tailed prediction model is then optimized under the instruction of the well-trained ``Propheter'', such that the distributions of different classes are as distinguishable as possible from each other. Experiments on eight long-tailed benchmarks across three architectures demonstrate that the proposed prophetic paradigm acts as a promising solution to the challenge of limited class knowledge in long-tailed datasets. The developed code is publicly available at \url{https://github.com/tcmyxc/propheter}.

Hierarchical Attribute-Based Encryption Scheme Supporting Computing Outsourcing and Time-Limited Access in Edge Computing

ABSTRACT. With the rapid increase of user data and traffic, the traditional attribute encryption scheme based on the central cloud will cause the bottleneck of computing performance. And user's access privilege and ciphertext in the existing scheme is not limited by time and times, which could be brute force attack. We propose a solution to support computing outsourcing and time-limited access in edge computing. Edge nodes can shorten data transmission distances and eliminate latency issues. To solve the central cloud perfor-mance problem during encryption and decryption, massive and complex computing is considered outsource to edge nodes. And set valid time for the ciphertext and the user key, which avoid their permanent validity and significantly improve data security. In addition, all attributes are divided into attribute trees. According to the hierarchical relationship between attributes, we judge the user's access privilege. Finally, we give security proof, performance cost, functional comparison of the scheme.

The Construction of DNA coding sets by An Intelligent Optimization Algorithm: TMOL-TSO

ABSTRACT. DNA computing has a natural advantage in solving NP-complete problems due to its high concurrency and low energy consumption. With the development of DNA computing, the preliminary formation of DNA logic circuit architectures further illustrates the potential of this field. Additionally, DNA holds great potential for storage due to its high density, large capacity, and long-term stability. It is suitable for database construction and data storage. However, non-specific hybridization of DNA molecules may cause unexpected outcomes during computation or storage processes. Therefore, it is crucial to apply pressure constraints to DNA coding to ensure stability. Designing a DNA coding set that satisfies constraints is a primary challenge. In this paper, we propose a Tuna Swarm Optimization(TSO) algorithm that employs random opposition-based learning strategy and Two-Swarm Merge strategy. This algorithm has stronger global exploration capabilities. Experimental results demonstrate that this algorithm can find a better coding set in some cases.

A Novel Iterative Fusion Multi-Task Learning Framework for Solving Dense Prediction

ABSTRACT. Dense prediction tasks are hot topics in computer vision that aim to predict each input image pixel, such as semantic segmentation, monocular depth estimation, edge estimation, etc. With advanced deep learning, many dense prediction tasks have been greatly improved. Multi-task learning is one of the top research lines to boost task performance further. Properly designed multi-task model architectures have better performance and minor memory usage than single-task models. This paper proposes a novel Multi-task Learning (MTL) framework with a task pair interaction module to tackle several dense prediction tasks. Different from most widely used MTL structures which share features on some specific layer and branch to task-specific layer, the output task-specific features are remixed via a task pair interaction module (TPIM) to get more shared features in this paper. Due to joint learning, tasks are mutually supervised and provide rich shared information to each other for improving final results. The TPIM includes a novel cross-task interaction block (CIB) which comprises two attention mechanisms, self-attention and pixel-wise global attention. In contrast with the commonly used global attention mechanism, an Iterative Fusion Block (IFB) is introduced to effectively fuse affinity information between task pairs. Extensive experiments on two benchmark datasets (NYUD-v2 and PASCAL) demonstrate that our proposal is effective in comparison to existing methods.

Sequential Transformer for End-to-End Person Search

ABSTRACT. Person Search aims to simultaneously localize and recognize a target person from realistic and uncropped gallery images. One major challenge of person search comes from the contradictory goals of the two sub-tasks, i.e., person detection focuses on finding the commonness of all persons so as to distinguish persons from the background, while person re-identification (re-ID) focuses on the differences among different persons. In this paper, we propose a novel Sequential Transformer (SeqTR) for end-to-end person search to deal with this challenge. Our SeqTR contains a detection transformer and a novel re-ID transformer that sequentially addresses detection and re-ID tasks. The re-ID transformer comprises the self-attention layer that utilizes contextual information and the cross-attention layer that learns local fine-grained discriminative features of the human body. Moreover, the re-ID transformer is shared and supervised by multi-scale features to improve the robustness of learned person representations. Extensive experiments on two widely-used person search benchmarks, CUHK-SYSU and PRW, show that our proposed SeqTR not only outperforms all existing person search methods with a 59.3% mAP on PRW but also achieves comparable performance to the state-of-the-art results with an mAP of 94.8% on CUHK-SYSU.

An Ontology for Industrial Intelligent Model Library and Its Distributed Computing Application

ABSTRACT. In the context of Industry 4.0, the paradigm of manufacturing has shifted from autonomous to intelligent by integrating advanced communication technologies. However, to enable manufacturers to respond quickly and accurately to the complex environment of manufacturing, knowledge of manufacturing required suitable representation. Ontology is a proper solution for knowledge representation, which is used to describe concepts and attributes in a specified domain. This paper proposes an ontology-based industrial model and significantly improves the interoperability of the models. Firstly, we conceptualize the attribute of the industrial models by provide concept and their properties in the schema layer of the ontology. Then, according to the data collected from manufacturing system, a number of instances are created and stored in the data layer. In addition, we present a prototype distributed computing application. The result suggests that the ontology can optimize the management of industrial models and achieve interoperability between models.

A Novel Framework for Forecasting Mental Stress Levels based on Physiological Signals

ABSTRACT. Mental stress may negatively affect individual health, life and work.Early intervention of such disease may help improve the quality of their lives and avoid major accidents caused by stress.Unfortunately, detection of stress levels at the present time can hardly meet the requirements such as stress regulations, which is due to the lack of anticipation of future stress changes, thus it is necessary to forecast the state of mental stress.In this study, we propose a supervised learning framework to forecast mental stress levels. Firstly, we extract a series of features of physiological signals including electroencephalography (EEG) and electrocardiography (ECG); secondly, we apply various autoregressive (AR) models to forecast stress features based on the extracted features; finally, the forecasted features are fed into several conventional machine learning based classification models to achieve forecasting of mental stress levels at subsequent time steps. We compare the effectiveness of the proposed framework on three competitive methods using three different datasets. The experimental results demonstrate that our proposed method outperforms those three methods and achieve a better forecasting accuracy of 89.65\%. In addition, we present a positive correlation between mental state changes and forecasting result on theta spectrum at the frontal region.

Multi-scale Structural Asymmetric Convolution for Wireframe Parsing

ABSTRACT. Extracting salient line segments with their corresponding junctions is a promising method for structural environment recognition. However, conventional methods extract these structural features using square convolution, which greatly restricts the model performance and leads to unthoughtful wireframes due to the incompatible geometric properties with these primitives. In this paper, we propose a Multi-scale Structural Asymmetric Convolution for Wireframe Parsing (MSACWP) to simultaneously infer prominent junctions and line segments from images. Benefiting from the similar geometric properties of asymmetric convolution and line segment, the proposed Multi-Scale Asymmetric Convolution (MSAC) effectively captures long-range context feature and prevents the irrelevant information from adjacent pixels. Besides, feature maps obtained from different stages in decoder layers are combined using Multi-Scale Feature Combination module (MSFC) to promote the multi-scale feature representation capacity of the backbone network. Sufficient experiments on two public datasets (Wireframe and YorkUrban) are conducted to demonstrate the advantages of our proposed MSACWP compared with previous state-of-the-art methods.

A Fine-Grained Domain Adaptation Method for Cross-Session Vigilance Estimation in SSVEP-Based BCI

ABSTRACT. Brain-computer interface (BCI), a direct communication system between the hu-man brain and external environment, can provide assistance for people with disa-bilities. Vigilance is an important cognitive state and has a close influence on the performance of users in BCI systems. In this study, a four-target BCI system for cursor control was built based on steady-state visual evoked potential (SSVEP) and twelve subjects were recruited and carried out two long-term BCI experi-mental sessions, which consisted of two SSVEP-based cursor-control tasks. During each session, electroencephalogram (EEG) signals were recorded. Based on the labeled EEG data of the source domain (previous session) and a small amount of unlabeled EEG data of the target domain (new session), we developed a fine-grained domain adaptation network (FGDAN) for cross-session vigilance estimation in BCI tasks. In the FGDAN model, the graph convolution network (GCN) was built to extract deep features of EEG. The fined-grained feature alignment was proposed to highlight the importance of the different channels fig-ured out by the attention weights mechanism and aligns the feature distributions between source and target domains at the channel level. The experimental results demonstrate that the proposed FGDAN achieved a better performance than the compared methods and indicate the feasibility and effectiveness of our methods for cross-session vigilance estimation of BCI users.

S3ACH: Semi-Supervised Semantic Adaptive Cross-modal Hashing

ABSTRACT. Hash learning has been a great success in large-scale data retrieval field because of its superior retrieval efficiency and storage consumption. However, labels for large-scale data are difficult to obtain, thus supervised learning-based hashing methods are no longer applicable. In this paper, we introduce a method called Semi-supervised Semantic Adaptive Cross-modal Hashing (S3ACH), which improves performance of unsupervised hash retrieval by exploiting a small amount of available label information. Specifically, we first propose a higher-order dynamic weight public space collaborative computing method, which balances the contribution of different modalities in the common potential space by invoking adaptive higher-order dynamic variable. Then, less available label information is utilized to enhance the semantics of hash codes. Finally, we propose a discrete optimization strategy to solve the quantization error brought by the relaxation strategy and improve the accuracy of hash code production. The results show that S3ACH achieves better effects than current advanced unsupervised methods and provides more applicable while balancing performance compared with the existing cross-modal hashing.

ADEQ: Adaptive Diversity Enhancement for Zero-shot Quantization

ABSTRACT. Zero-shot quantization is an effective way to compress neural networks, especially when real training sets are inaccessible because of privacy and security issues. Most existing synthetic-data-driven zero-shot quantization methods introduce diversity enhancement to simulate the distribution of real samples. However, the adaptivity between the enhancement degree and network is neglected, i.e., whether the enhancement degree benefits different network layers and different classes, and whether it reaches the best match between the inter-class distance and intra-class diversity. Due to the absence of the metric for class-wise and layer-wise diversity, maladaptive enhancement degree run the vulnerability of mode collapse of the inter-class inseparability. To address this issue, we propose a novel zero-shot quantization method, ADEQ. For layer-wise and class-wise adaptivity, the enhancement degree of different layers is adaptively initialized with a diversity coefficient. For inter-class adaptivity, an incremental diversity enhancement strategy is proposed to achieve the trade-off between inter-class distance and intra-class diversity. Extensive experiments on the CIFAR-100 and ImageNet show that our ADEQ is observed to have advanced performance at low bit-width quantization. For example, when Resnet-18 is quantized to 3 bits, we improve top-1 accuracy by 17.78% on ImageNet compared to state-of-the-art methods. Code at https://github.com/dangsingrue/ADEQ.

PLKA-MVSNet: Parallel Multi-View Stereo with Large Kernel Convolution Attention

ABSTRACT. In this paper, we propose PLKA-MVSNet to address the remaining challenges in the in-depth estimation of learning-based multi-view stereo (MVS) methods, particularly the inaccurate depth estimation in challenging areas such as low-texture regions, weak lighting conditions, and non-Lambertian surfaces. We ascribe this problem to the insufficient performance of the feature extractor and the information loss caused by the MVS pipeline transmission, and give our optimization scheme. Specifically, we introduce parallel large kernel attention (PLKA) by using multiple small convolutions instead of a single large convolution, to enhance the perception of texture and structural information, which enables us to capture a larger receptive field and long-range information. In order to adapt to the coarse-to-fine MVS pipeline, we employ PLKA to construct a multi-stage feature extractor. Furthermore, we propose the parallel cost volume aggregation (PCVA) to enhance the robustness of the aggregated cost volume. It introduces two decision-making attentions in the 2D dimension to make up for information loss and pixel omission in the 3D convolution compression. Particularly, our method shows the best overall performance beyond the transformer-based method on the DTU dataset and achieves the best results on the challenging Tanks and Temples advanced dataset.

Heterogeneous Graph Fusion with Adversarial Learning for Recommendation Service

ABSTRACT. The recommendation system has experienced rapid development, with various types of side information being utilized to address issues of data sparsity and the cold start problem. Social recommendation, in particular, relies on modeling social information to provide high-order information beyond user-item interaction. However, current approaches rely on graph neural network-based social network embedding, which can result in an over-smoothing problem. Additionally, graph diffusion, which encodes high-order features, can add noise to the model. Previous research has not adequately addressed the latent influence of social relations. Therefore, a new approach to investigating social relations is crucial for advancing social recommendation. In this work, we introduce a new recommendation framework named Heterogeneous Information Network Fusion with Adversarial Learning (HIN-FusionGAN), which inherently fuses adversarial learning-enhanced social networks with the fused graph between the user-item interaction graph and user-user social graph. We propose a heterogeneous information network that fuses social and interaction graphs into a unified heterogeneous graph, explicitly encoding high-order collaborative signals. We employ user embeddings using both interaction information and adversarial learning-enhanced social networks, which are efficiently fused by the feature fusion model. To address the issue of over-smoothing and uncover latent feature representation, we use the structure of an adversarial network. Comprehensive experiments on three real-world datasets demonstrate the superiority of our proposed model.

Enhancement of Masked Expression Recognition Inference Via Fusion Segmentation and Classifier

ABSTRACT. Despite remarkable advancements in facial expression recognition, recognizing facial expressions from occluded facial images in real-world environments remains a challenging task. Various types of occlusions randomly occur on the face, obstructing relevant information and introducing unwanted interference. Moreover, occlusions can alter facial structures and expression patterns, causing variations in key facial landmarks. To address these challenges, we propose a novel approach called the Occlusion Removal and Information Capture (ORIC) network, which fuses segmentation and classification. Our method consists of three modules: the Occlusion Awareness (OA) module learns about occlusions, the Occlusion Purification (OP) module generates robust and multi-scale occlusion masks to purify the occluded facial expression information, and the Expression Information Capture (EIC) module extracts comprehensive and robust expression features using the purified facial information. ORIC can eliminate the interference caused by occlusion and utilize both local region information and global semantic information to achieve facial expression recognition under occluded conditions. Through experimental evaluations on synthetic occluded RAF-DB and AffectNet datasets, as well as a real occluded dataset FED-RO, our method demonstrates significant advantages and effectiveness. Our research provides an innovative solution for recognizing facial expressions from occluded facial images in wild environments.

Prediction and analysis of acoustic displacement field using the method of neural network

ABSTRACT. Micro/nano manipulation technology holds significant value in diverse fields, such as biomedicine, precision killing, and material chemistry. Among these applications, acoustic manipulation technology stands out as a crucial approach for micro/nano manipulation. However, the determination of a control equation for acoustic manipulation proves challenging due to the complex motion characteristics exhibited by micro/nano targets. The mastery of acoustic displacement field information is essential for successful manipulation of micro targets. Addressing this challenge, this paper presents a novel method that utilizes neural networks for predicting sound wave displacement fields. Which involves collecting raw data on acoustic displacement changes under specific excitation frequencies and times utilizing the acoustic manipulation experimental platform. Using the data to train the neural network model. The network takes the initial position information of the micro target as input and accurately predicts the displacement field across the entire thin plate area, achieving a remarkable prediction accuracy of 0.5mm. Through a comprehensive analysis of the network's prediction performance using test set data, the effectiveness of the designed network in solving the acoustic displacement field problem with high accuracy is demonstrated. This research significantly contributes to the advancement of acoustic manipulation technology, enabling precise control and manipulation of micro/nano targets across various applications.

Semantic Line Detection Using Deep-Hough Network with Attention Mechanism and Strip Convolution

ABSTRACT. Semantic lines are some particular dominant lines in an image, which separate the image into several semantic regions and outline its conceptual structure. They play a vital role in image analysis, scene understanding, and other downstream tasks due to their semantic representation ability for the image layout. However, the accuracy and efficiency of existing semantic line detection methods still couldn't meet the need of real applications. So, a new semantic line detection method based on deep Hough transform network with strip convolution and attention is proposed. Firstly, the detection performance is improved by combining the channel attention mechanism with the feature pyramid network to alleviate the influence of redundant information. Then, the strip convolution and mixed pooling layer are introduced to effectively collect the remote information and capture the long-range dependencies between pixel backgrounds. Finally, the strategy of GhostNet is adopted to reduce the computational cost. Results of experiments on open datasets validate the proposed method, which is comparable to and even outperforms the state-of-the-art methods in accuracy and efficiency. Our code and pretrained models are available at: https://github.com/zhizhz/sml.

Intelligent UAV Swarm Planning Based on Undirected Graph Model

ABSTRACT. The coordination of multiple drones for formation flight and collaborative task execution in the air, known as drone cluster control, has become a key research focus in recent years. Collision avoidance dur- ing formation maintenance remains a challenging aspect of cluster con- trol, as traditional cluster control and path planning algorithms struggle to enable individual drones to independently avoid obstacles and main- tain formation within the cluster. To address this issue, this paper pro- poses a cluster modeling method based on an undirected graph, which optimizes the entire model by adding constraints. A distributed system is also utilized to plan the entire cluster, improving the efficiency and robustness of the system and allowing individual drones to execute their flight tasks independently. Experimental verification was conducted in a ROS-based simulation environment, and the results demonstrate that our proposed algorithm effectively maintains the formation of drone clus- ters with high performance and stability.

Adaptive Multi-hop Neighbor Selection for Few-shot Knowledge Graph Completion

ABSTRACT. Few-shot Knowledge Graph Completion (FKGC) is a special task proposed for the relations with only a few triples. However, existing FKGC models face the following two issues: 1) these models cannot fully exploit the dynamic relation and entity properties of neighbors to generate discriminative representations; 2) these models cannot filter out noise in higher-order neighbors to obtain reliable entity representations. In this paper, we propose an adaptive multi-hop neighbor selection model, namely AMBLE, to mitigate these two issues. Specifically, AMBLE first introduces a query-aware graph attention network (QAGAT) to obtain entity representations by dynamically aggregating one-hop neighbors based on relations and entities. Then, AMBLE aggregates higher-order neighbors by iterating QAGAT and LSTM, which can efficiently extract useful and filter noisy information. Moreover, a Transformer encoder is used to learn the representations of subject and object entity pairs. Finally, we build an attentional matching network to map the query to few support triples. Experiments show that AMBLE outperforms state-of-the-art baselines on two public datasets.

Amortized variational inference via Nosé-Hoover Thermostat Hamiltonian Monte Carlo

ABSTRACT. Sampling latents from the posterior distribution efficiently and accurately is a fundamental problem for posterior inference. Markov chain Monte Carlo (MCMC) is such a useful tool to do that but at the cost of computational burden since it needs many transition steps to converge to the stationary distribution for each datapoint. Amortized variational inference within the framework of MCMC is thus proposed where the learned parameters of the model are shared by all observations. Langevin autoencoder is a newly proposed method that amortizes inference in parameter space. This paper generalizes the Langevin autoencoder by utilizing the stochastic gradient Nosé-Hoover Thermostat Hamiltonian Monte Carlo to conduct amortized updating of the parameters of the inference distribution. The proposed method improves variational inference accuracy for the latent by subtly dealing with the noise introduced by stochastic gradient without estimating that noise explicitly. Experiments benchmarking our method against baseline generative methods highlight the effectiveness of our proposed method.

Applications of Quantum Embedding in Computer Vision

ABSTRACT. Nowadays, Deep Neural Networks (DNNs) are fundamental to many vision tasks, including large-scale visual recognition. As the primary goal of the DNNs is to characterize complex boundaries of thousands of classes in a high-dimensional space, it is critical to learn higher-order representations for enhancing nonlinear modeling capability. Recently, a novel method called Quantum-State-based Mapping (QSM) has been proposed to improve the feature calibration ability of the existing attention modules in transfer learning tasks. QSM uses the wave function describing the state of microscopic particles to map the feature vector into the probability space. In essence, QSM introduces a novel higher-order representation to improve the nonlinear capability of the network. In this paper, we extend QSM to Quantum Embedding (QE) for designing new attention modules and Self-Organizing Maps, a class of unsupervised learning methods. We also conducted experiments to validate the effectiveness of QE.

Traffic Accident Forecasting Based on a GrDBN-GPR Model with Integrated Road Features

ABSTRACT. Traffic accidents pose a significant challenge in modern society, leading to substantial human loss and economic damage. Therefore, accurate forecasting of such accidents holds a paramount importance in road safety status evaluation. However, models in many studies often prioritize individual factors like accuracy, stability, or anti-interference ability, rather than considering them comprehensively. Toward this end, this study presents a novel traffic accident forecasting model, known as the Gaussian radial Deep Belief Net - Gaussian Process Regression (GrDBN-GPR). This model integrates feature engineering and predictive algorithms to capture the intricate relationships among various traffic factors. This model comprises two key components: firstly, the GrDBN uses the Gaussian-Bernoulli Restricted Boltzmann Machine (GBRBM) and the Gaussian activation functions to extract valuable features more effectively and stably. This feature extraction mechanism enhances the ability to uncover meaningful patterns within the data. Secondly, the GPR can achieve stable predictions based on the extracted informative features achieved by GrDBN. Finally, this model is applied to evaluate the road safety status of Highway 401 in Ontario, Canada, using a set of collision data collected for over eight years. In comparison to six commonly used benchmark models, the predictive accuracy, stability, and resistance to interference of the proposed model are evaluated.

Graph Multi-Dimensional Feature Network

ABSTRACT. Graph Neural Networks (GNNs) have attracted extensive interest in the world because of its superior performance in the field of graph representation learning. Most GNNs have a message passing mechanism to update node representations by aggregating and transforming input from node neighbors. The current methods use the same strategy to aggregate information from each feature dimension. However, according to current papers, the model will be more practical if the feature information of each dimension can be treated differently throughout the aggregating process. In this paper, we introduces a novel Graph Neural Network-Graph Multi-Dimensional Feature Network (GMDFN). The method is accomplished by mining feature information from diverse dimensions and aggregating information using various strategies. Furthermore, a Self-Supervised learning module is built to keep the node feature information from being destroyed too much in the aggregation process to avoid over-smoothing. A large number of experiments on different realworld datasets have shown that the model outperforms various current GNN models and is more robust.

Phishing Scam Detection for Ethereum Based on Community Enhanced Graph Convolutional Networks

ABSTRACT. Blockchain technology has garnered a lot of interest recently, but it has also become a breeding ground for various network crimes. Cryptocurrency, for example, has suffered losses due to network phishing scams, posing a serious threat to the security of blockchain ecosystem transactions. To create a favorable investment environment, we propose a community-enhanced phishing scam detection model in this paper. We approach network phishing detection as a graph classification task and introduce a network phishing detection graph neural network framework. Firstly, we construct an Ethereum transaction network and extract transaction subgraphs, and corresponding content features from it. Based on this, we propose a community-enhanced graph convolutional network (GCN)-based detection model. It extracts more reasonable node representations in the GCN neighborhoods and explores the advanced semantics of the graph by defining community structure and measuring the similarity of nodes in the community. This distinguishes normal accounts from phishing accounts. Experiments on different large-scale real-data sets of Ethereum consistently demonstrate that our proposed model performs better than related methods.

DTP: An Open-domain Text Relation Extraction Method

ABSTRACT. Open-domain text relation extraction (OpenTRE) is a subfield of information extraction that focuses on extracting relational facts from open-domain corpora. While recent OpenTRE researchs have shown promise in clustering unlabeled instances leveraging knowledge from labeled data, the assumption that the testing set only contains open relations is impractical for real-world scenarios where known and open relations are mixed. Therefore, a novel OpenTRE method based on Dynamic Thresholds and Pair-based Self-weighting Loss (DTP) is proposed. It facilitates appropriate dataset processing by categorizing instances and accurately predicting unknown relations. Specifically, we break down OpenTRE into two stages: detecting and discovering, which makes the OpenTRE process more understandable. Sample-based dynamic threshold strategy is employed to clarify the relation boundaries. Additionally, pair-based self-weighted loss improves the capture of semantic knowledge in labeled data and guides clustering. Experimental results indicate that this method outperforms strong baseline models on two datasets with significant improvements.

SSVEP Data Augmentation based on Filter Band Masking and Random Phase Erasing

ABSTRACT. Steady-state visual evoked potential (SSVEP)-based brain-computer interfaces (BCIs) have shown promising results in various applications, such as controlling prosthetic devices and augmented reality systems. However, current data-driven frequency recognition methods used to build high-performance SSVEP-BCIs often encounter overfitting and poor generalization when training data is limited. To address this issue, in this paper, we propose two potential SSVEP data augmentation methods, namely filter band masking (FBM) and random phase erasing (RPE), based on the inherent features of SSVEPs. These methods can produce high-quality supplementary training data to improve the performance of SSVEP-BCIs without parameter learning, making them easy to implement. To evaluate the proposed methods, two large-scale publicly available datasets (Benchmark and BETA) were used, and the experimental results showed that the proposed methods significantly could enhance the classification performance of baseline classifiers with a limited amount of calibration data. Specifically, evaluated on two methods, FBM increased the average accuracies by 7.40\%, 8.55\%, and RPE increased the average accuracies by 5.85\%, 6.10\%, respectively, with as few as two 1-scecond calibration trials on the Benchmark dataset. These findings demonstrate the potential of these data augmentation methods in enhancing the practicality of SSVEP-BCI for real-life scenarios.

MS3DAAM: Multi-scale 3-D Analytic Attention Module for Convolutional Neural Networks

ABSTRACT. Recently, an increasing number of sophisticated attention modules are proposed to boost the performance of convolutional neural networks. However, they inevitably suffer from heavy computational overhead. In this paper, we propose a compact and effective module, called multi-scale 3-D analytic attention module (MS3DAAM) to address this challenge. We significantly reduce model complexity by developing a decoupling-and-coupling strategy. Firstly, we factorize the regular attention along channel, height and width directions and then efficiently encode the information via 1-D convolutions, which greatly saves the computational power. Secondly, we multiply the weighted embedding results of the three direction vectors to regain a better 3-D attention map, which allocates an independent weight to each neuron, thus developing a unified measurement method for attention. Furthermore, multi-scale method is introduced to further strengthen our module capability in locating by capturing both the inter-channel relationships and long-range spatial interactions from different receptive fields. Finally, we develop a structural re-parameterization technique for multi-scale 1-D convolutions to boost the inference speed. Extensive experiments in classification and object detection verify the superiority of our proposed method over other state-of-the-art counterparts. This factorizing-and-combining mechanism with the beauty of brevity can be further extended to simplify similar network structures.

CBDN: A Chinese short-text classification model based on Chinese BERT and fused deep neural networks

ABSTRACT. To address the common issues in Chinese short-text classification caused by the lack of contextual information, ambiguity, and sparsity of semantic features due to the short length of the text, a feature fusion-based Chinese short-text classification model CBDN is proposed. Firstly, the Chinese-BERT-wwm pre-trained model, improved by the full-word masking technique, is selected as the embedding layer to output the vector representation of the short text. Secondly, to fully extract the limited semantic features of the short text, the model employs a multi-head self-attention module and a long connected bidirectional LSTM (LC-BiLSTM) network to further learn the semantic features, and then fuses the hidden layer output vector with the feature vector further processed by these two methods. Finally, to improve the classification performance, the fused features are input into an improved "pyramid CNN" (PCNN) layer, and the short-text classification result is obtained through the classifier. The CBDN model is experimentally compared with various baseline models on the THUCNews dataset. The experimental results show that the proposed model achieves an accuracy and precision of 94.38\% and 94.37\%, respectively, outperforming other baseline models, indicating that the model better extracts the semantic information of short text and effectively improves the classification performance of Chinese short text.

Multi-Task Feature Self-Distillation for Semi-Supervised Machine Translation

ABSTRACT. The performance of the model suffers dramatically when the scale of supervision data is constrained because the most sophisticated neural machine translation models are all data-driven techniques. Adding monolingual data to the model training process and using the back-translation method to create pseudo-parallel sentences for data augmentation is the current accepted approach to address data shortage. However, this method's training procedures are laborious, it takes a lot of time, and some of the created pseudo-parallel sentences are of poor quality and detrimental to the model. In this paper, we propose a semi-supervised training method—Multi-Task Feature Self-Distillation (MFSD), which can train models online on a mixed dataset consisting of bilingual and monolingual data jointly. Based on the supervised machine translation task, we propose a self-distillation task for the encoder and decoder of the source and target languages to train two kinds of monolingual data online. In the self-distillation task, we build a teacher model by integrating the student models of the previous rounds, and then use the feature soft labels output by the teacher model with more stable performance to guide the student model training online, and realize the online mining single High-level knowledge of language data by comparing the consistency of the output features of the two models. We conduct experiments on the standard data set and the multi-domain translation data set respectively. The results show that MFSD performed better than mainstream semi-supervised and data augmentation methods approaches. Compared with supervised training, our method improves the BLEU score by an average of +2.27, and effectively improves the model's domain adaptability and generalization ability.

ADGCN: A Weakly Supervised Framework for Anomaly Detection in Social Networks

ABSTRACT. Detecting abnormal users in social networks is crucial for protecting user privacy and preventing criminal activities. However, existing graph learning methods have limitations. Unsupervised methods focus on topological anomalies and may overlook user characteristics, while supervised methods require costly data annotations. To address these challenges, we propose a weakly supervised framework called Anomaly Detection Graph Convolutional Network (ADGCN). Our model includes three modules: information-preserving compression of user features, collaborative mining of global and local graph information, and multi-view weakly supervised classification. We demonstrate that ADGCN generates high-quality user representations using minimal labeled data and achieves state-of-the-art performance on two real-world social network datasets. Ablation experiments and performance analyses show the feasibility and effectiveness of our approach in practical scenarios.

Light Field Image Super-Resolution via Global-View Information Adaptation-Guided Deformable Convolution Network

ABSTRACT. Light field (LF) cameras can record 3D scenes from multiple views, whose sub-aperture images contain both spatial and angular information. This information is beneficial for image super-resolution (SR). However, existing some CNN-based approaches of LF image SR ignore the utilization of global-view information. This information can reflect the correlation information among all LF. Moreover, due to the distinctive properties existing in each view, it is important to model an adaptation network for each LF image to supplement complementary information from other views. In this paper, we propose a global-view information adaptation-guided deformable convolution network (LF-GAGNet) for SR. Our network can dynamically achieve feature alignment from each individual-view domain to the global domain, which is guided by global-view information. Then, spatial and angular information is effectively integrated among all LF via an attention mechanism. Specifically, our LF-GAGNet is a dual-branch structure. The top branch can effectively extract global-view information, and dynamically construct the guidance factors for each view through a global-view adaptation-guided module (GAGM). Then, these distinctive factors are fed to a deformable convolution as offsets, which can achieve a global feature-level alignment in the bottom branch. Moreover, we propose an angular attention fusion module (AAFM) to adaptively supplement discriminative angular features for different LF views. Experimental results in different real scenarios demonstrate that our method outperforms other state-of-the-art methods in terms of SR accuracy and performance. Furthermore, the quantitative results show that our LF-GAGNet can also solve complex LF realistic scenarios well.

AGGDN: A Continuous Stochastic Predictive Model for Monitoring Sporadic Time Series on Graphs

ABSTRACT. Monitoring data of real-world networked systems could be sparse and irregular due to node failures or packet loss, which makes it a challenge to model the continuous dynamics of system states. Representing a network as graph, we propose a deep learning model, Adversarial Graph-Gated Differential Network (AGGDN). To accurately capture the spatial-temporal interactions and extract hidden features from data, AGGDN introduces a novel module, dynDC-ODE, which empowers Ordinary Differential Equation (ODE) with learning-based Diffusion Convolution (DC) to effectively infer relations among nodes and parameterize continuous-time system dynamics over graph. It further incorporates a Stochastic Differential Equation (SDE) module and applies it over graph to efficiently capture the underlying uncertainty of the networked systems. Different from any single differential equation model, the ODE part also works as a control signal to modulate the SDE propagation. With the recurrent running of the two modules, AGGDN can serve as an accurate online predictive model that is effective for either monitoring or analyzing the real-world networked objects. In addition, we introduce a soft masking scheme to capture the effects of partial observations caused by the random missing of data from nodes. As training a model with SDE component could be challenging, Wasserstein adversarial training is exploited to fit the complicated distribution. Extensive results demonstrate that AGGDN significantly outperforms existing methods for online prediction.

Learning Item Attributes and User Interests for Knowledge Graph Enhanced Recommendation

ABSTRACT. Knowledge Graphs (KGs) manifest great potential in recommendation. This is ascribable to the rich attribute information contained in KG, such as the price attribute of goods, which is further integrated into item and user representations and improves recommendation performance as side information. However, existing knowledge-aware methods leverage attribute information at a coarse-grained level in two aspects: (1) item representations don’t accurately learn the distributional characteristics of different attributes, and (2) user representations don’t sufficiently recognize the pattern of user preferences towards attributes. In this paper, we propose a novel attentive knowledge graph attribute network(AKGAN) to learn item attributes and user interests via attribute information in KG. Technically, AKGAN adopts a novel graph neural network framework, which has a different design between the first layer and the latter layer. The first layer merges one-hop neighbors’ attribute information by concatenation operation to avoid breaking down the independence of different attributes, and the latter layer recursively propagates attribute information without weight decrease of high-order significant neighbors. With one attribute placed in the corresponding range of element-wise positions, AKGAN employs a novel interest-aware attention unit, which releases the limitation that the sum of attention weight is 1, to model the complexity and personality of user interests. Experimental results on three benchmark datasets show that AKGAN achieves significant improvements over the state-of-the-art methods like KGAT, KGIN and PRKG. Further analyses show that AKGAN offers interpretable explanations for user preferences towards attributes.

Leveraging Two-scale Features to Enhance Fine-grained Object Retrieval

ABSTRACT. Constructing a discriminative embedding for an image based on the features extracted by a convolutional neural network (CNN) has become a common solution for fine-grained object retrieval (FGOR). However, existing methods construct the embedding based solely on features extracted by the last layer of CNN, neglecting the potential benefits of leveraging features from other layers. Based on the fact that features extracted by different layers of CNN represent different abstraction and semantic information on those levels, we believe that leveraging features from multiple layers of CNN can construct a more discriminative embedding. Upon this, we propose a simple yet efficient end-to-end model named TSF-Enhance, which leverages two-scale features extracted by the CNN to construct the discriminative embedding. Specifically, we extract features from the third and fourth layers of Resnet50 and construct an embedding based on features from these two layers respectively. When testing, we concatenate these two embeddings to get a more discriminative embedding for retrieval. Additionally, we design a Feature Enhancement Module (FEM) that consists of several common operations, such as layer normalization, to process the features. Finally, we achieve competitive results on three FGOR datasets, specifically exceeding the current state-of-the-art performance on the most challenging dataset CUB200. Furthermore, our model also demonstrates strong scalability compared to localization-based methods, achieving the best performance on two general-purpose image retrieval datasets. The source code is available at https://github.com/jingyj203/TSF-Enhance.

Neural-Symbolic Recommendation with Graph-Enhanced Information

ABSTRACT. The recommendation system is not only a problem of inductive statistics from data but also a cognitive task that requires reasoning ability. The most advanced graph neural networks have been widely used in recommendation systems because they can capture implicit structured information from graph-structured data. However, like most neural network algorithms, they only learn matching patterns from a perception perspective. Some researchers use user behavior for logic reasoning to achieve recommendation prediction from the perspective of cognitive reasoning. However, this kind of reasoning is a local one and ignores implicit information on a global scale. In this work, we combine the advantages of graph neural networks and neural logic modules to construct a neuro-symbolic recommendation model with both global implicit reasoning ability and local explicit logic reasoning ability. We first build an item-item graph based on the principle of adjacent interaction and use graph neural networks to capture implicit information in global data. Then we transform user behavior into propositional logic expressions to achieve recommendations from the perspective of cognitive reasoning. Extensive experiments on five public datasets show that our proposed model outperforms several state-of-the-art methods, source code is avaliable at [https://github.com/hanzo2020/GNNLR].

Attribution Guided Layerwise Knowledge Amalgamation from Graph Neural Networks

ABSTRACT. Knowledge Amalgamation (KA), aiming to transfer knowledge from multiple well-trained teacher networks to a multi-talented and compact student, is gaining attention due to its crucial role in resource-constrained scenarios. Previous literature on KA, although exhibiting promising results, is primarily geared toward Convolutional Neural Networks (CNNs). However, when transferred to Graph Neural Networks (GNNs) with non-grid data, KA techniques face new challenges that can be difficult to overcome. Moreover, the layerwise aggregation of GNNs produces significant noise as they progress from a shallow to a deep level, which can impede KA students' deep-level semantic comprehension. This work aims to overcome this limitation and propose a novel strategy termed LAyerwIse Knowledge Amalgamation (LaiKA). It involves Hierarchical Feature Alignment between the teachers and the student, which enables the student to directly master the feature aggregation rules from teacher GNNs. Meanwhile, we propose a Selective Attribution Transfer (SAT) module that identifies task-relevant topological substructures to assist the capacity-limited student in mitigating noise and enhancing performance. Extensive experiments conducted on six datasets demonstrate that our proposed method equips a single student GNN to handle tasks from multiple teachers effectively and achieve comparable or superior results to those of the teachers without human annotations.

Contrastive Learning Augmented Graph Auto-Encoder for Graph Embedding

ABSTRACT. Graph embedding (GE) aims to embed the information of graph data into low-dimensional representation space. Prior methods generally suffer from an imbalance of preserving structural information and node features due to their pre-defined inductive biases, leading to unsatisfactory generalization performance. In order to preserve the maximal information, graph contrastive learning (GCL) has become a prominent technique for learning discriminative embeddings. However, in contrast with graph-level embeddings, existing GCL methods generally learn less discriminative node embeddings in a self-supervised way. In this paper, we ascribe above problem to two challenges: 1) graph data augmentations, which are designed for generating contrastive representations, hurt the original semantic information for nodes. 2) the nodes within the same cluster are selected as negative samples. To alleviate these challenges, we propose Contrastive Learning Augmented Graph Auto-Encoder (CGAE) and Variational Graph Auto-Encoder (CVGAE). Specifically, we first propose two distribution-dependent regularizations to guide the paralleled encoders to generate contrastive representations following similar distribution, followed by theoretical derivations to verify the equivalence of above regularizations. Then, we utilize truncated triplet loss, which only selects top-k nodes as negative samples, to avoid over-separate nodes affiliated to the same cluster. Experiments on several real-world datasets show that our models advanced performance over all baselines in link prediction, node clustering, and graph visualization tasks.

Multi-View Stereo by Fusing Monocular and a Combination of Depth Representation Methods

ABSTRACT. The design of plane-sweep deep MVS primarily relies on patch-similarity based matching. However, this approach becomes impractical when dealing with low-textured, similar-textured and reflective regions in the scene, resulting in inaccurate matching results. One of the methods to avoid this kind of error is incorporating semantic information in matching process. In this paper, we propose an end-to-end method that uses monocular depth estimation to add semantic information to deep MVS. Additionally, we analyze the advantages and disadvantages of two main depth representations and propose a collaborative method to alleviate their drawbacks. Finally, we introduce a novel filtering criterion named Distribution Consistency, which can effectively filter out outliers with poor probability distribution, such as uniform distribution, to further enhance the reconstruction quality.

Determination of local and global decision weights based on fuzzy modeling

ABSTRACT. An essential challenge in multi-criteria decision analysis (MCDA) is the determination of criteria weights. These weights map the decision maker’s preferences for decision problems in determining the importance of criteria. However, these values are not necessarily constant in the whole domain. Although many approaches are related to their determination, some MCDA models can have local weights that are difficult to map in global spaces. This paper focuses on an approach in which we determine global and local weights from the Characteristic Objects METhod (COMET) by using linear regression. Moreover, obtained linear models are compared with COMET and Technique for Order of Preferenceby Similarity to Ideal Solution (TOPSIS) models to answer how similar they are. Then, the relationships between the obtained global and local weights are analyzed based on a simple case study. The results demonstrate the high sensitivity of the COMET method and the applicability of the proposed approach for determining global and local weights. The most useful contribution is the proposed approach to identify local weights that can be used for deeper decision analysis.

Predefined-time Synchronization of Complex Networks with Disturbances by Using Sliding Mode Control

ABSTRACT. This paper aims to investigate the issue of predefined-time synchronization in complex networks with disturbances using sliding mode control technology. Firstly, a new predefined-time stability lemma is proposed based on the Lyapunov second method. By adjusting parameters, this new lemma can degenerate into previous ones and is, therefore, more general. Based on the proposed stability lemma and sliding mode technology, a new sliding mode surface is designed and an effective new sliding mode controller is designed to ensure that the system achieves synchronization within the predefined time. Moreover, the new effective sliding mode controller proposed in this paper has two advantages as follows: (1) the proposed sliding mode controller is robust against disturbances, which aligns more with practical application requirements. (2) The predefined time is set as the sliding mode controller parameter, avoiding overestimation of synchronization time. Finally, numerical simulations are presented to demonstrate the effectiveness of the proposed approach.

A Fast and Scalable Frame-Recurrent Video Super-Resolution Framework

ABSTRACT. The video super-resolution(VSR) methods based on deep learning have become the mainstream VSR methods and have been widely used in various fields. Although many deep learning-based VSR methods have been proposed, they cannot be applied to real-time VSR tasks due to the vast computation and memory occupation. The lightweight VSR networks have faster inference speeds, but their super-resolution performance could be better. In this paper, we analyze the explicit and implicit motion compensation methods commonly used in VSR networks and design a fast and scalable frame-recurrent VSR network(FFRVSR). FFRVSR incorporates the Frame-Recurrent Network and Recurrent-Residual Network. This network structure can extract information from low-resolution video frames more efficiently and alleviate error accumulation during inference. We also design a super-resolution flow estimation network(SRFnet) that can more accurately estimate optical flow between video frames while reducing error information ingress. Extensive experiments demonstrate that the proposed FFRVSR surpasses state-of-the-art methods in terms of inference speed. FFRVSR also has strong scalability and can be adapted for both real-time video super-resolution tasks and high-quality video super-resolution tasks.

Diachronic Named Entity Disambiguation for Ancient Chinese Historical Records

ABSTRACT. Named entity disambiguation (NED) is a foundamental task in NLP. Despite numerous methods have been proposed for NED in recent years, they ignore the fact that a large amount of corpora in the real world are diachronic by nature, such as historical documents or news articles, which have long time spans. As a consequence, most current methods fail to fully exploit the temporal information inside the corpora and knowledge bases. To address the above issues, we propose a novel model which integrates temporal feature into pretrained language model to make our model aware of time and a new sample re-weighting scheme for diachronic NED which penalizes highly-frequent mention-entity pairs to improve performance on rare and unseen entities. We present WikiCMAG and WikiSM, two new NED datasets annotated on ancient Chinese historical records. Experiments show that our model outperforms existing methods by large margins, proving the effectiveness of integrating diachronic information and our re-weighting schema. Our model also gains competitive performance on out-of-distribution (OOD) settings. The WikiSM dataset is publicly available at https://github.com/[to be disclosed upon acceptance].

Construction and Prediction of a Dynamic Multi-Relationship Bipartite Network

ABSTRACT. Bipartite networks are capable of representing complex systems that involve two distinct types of objects. However, there are limitations to the existing bipartite networks: 1) It is inadequate in characterizing multi-relationships among objects in complex systems, as it is restricted to depict only one type of relationship. 2) It is limited to static representations of complex systems, hampering their ability to describe dynamic changes in the interactions among objects over time. Therefore, the Dynamic Multi-Relationship Bipartite Network (DMBN) model is introduced, which not only models the dynamic multi-relationships between two types of objects in complex systems, but also enables dynamic prediction of the intricate relationships between objects. Extensive experiments were conducted on complex systems, and the results indicate that the DMBN model is significantly better than the baseline methods across multiple evaluation metrics, thereby proving the effectiveness of the DMBN.

Reward-Dependent and Locally Modulated Hebbian Rule for Pattern Classification

ABSTRACT. In pattern classification tasks, the convolutional network has been widely used to detect the features and the error backpropagation with gradient descent(BPGD) algorithm has been used to train the net- work. However, the plasticity of the synapse between neurons often de- pends on the potential of the pre- and postsynaptic membrane and the local concentration of neuromodulators such as dopamine. In this paper, we proposed the reward-dependent and locally modulated (RDLM) Heb- bian rule to train a multi-layer network to perform the pattern classifica- tion tasks. We found that by introducing local modulation, the reward- dependent Hebbian rule can successfully train multi-layer networks. We have shown that the performance of our method on the MINST and Fashion MINIST datasets can compete with the traditional BPGD algo- rithm. In conclusion, we proposed a biologically plausible learning rule that can compete with traditional BPGD in pattern classification tasks. The method can potentially be used to train the network with complex architecture for complex tasks

Contrastive Hierarchical Gating Networks for Rating Prediction

ABSTRACT. Review-based recommendations suffer from text noises and the absence of supervised signals. To address those challenges, we propose a novel hierarchical gated sentiment-aware model for rating prediction in this paper. Specifically, to automatically suppress the influence of noisy reviews, we propose a hierarchical gating network to select informative textual signals at different levels of granularity. Specifically, a local gating module is proposed to select reviews with personalized end-to-end differential thresholds. The aim is to gate reviews in a relatively ``hard" way to minimize the information flow from noisy reviews while facilitating the model training. A global gating module is employed to evaluate the overall usefulness of the review signals by estimating the uncertainties encoded in the historical reviews. In addition, a discriminative learning module is proposed to supervise the learning of the hierarchical gating network. The essential intuition is to exploit the sentiment consistencies between the target reviews and the target ratings for developing self-supervision signals so that the hierarchical gating network can select relevant reviews related to the target ratings for better prediction. Finally, extensive experiments on public datasets and comparison studies with state-of-the-art baselines have demonstrated the effectiveness of the proposed model, additional investigations also provide a deep insight into the rationale underlying the superiority of the proposed model.

DTSRN: Dynamic Temporal Spatial Relation Network for Stock Ranking Recommendation

ABSTRACT. The recommendation of stock ranking has always been a challenging task in the financial technology (FinTech) field. Achieving an excellent stock ranking result in stock ranking recommendations depends on mining the temporal relations within the stock and the complex spatial relations among the stocks. However, existing studies only consider the temporal relation features of stocks or introduce noise when extracting spatial relation features, which limits the performance of stock ranking recommendation tasks. To address this challenge, we propose the Dynamic Temporal Spatial Relation Network (DTSRN), which constructs a spatial relation graph with dynamic stock temporal relation features and extracts dynamic spatial relation features from different views for the stock ranking recommendation. Specifically, we construct learnable global-view and multi-view spatial graphs with stock temporal relation features and then employ efficient graph convolution operations to obtain the final stock representation. We extensively evaluate our method on two real-world datasets and compare it with several state-of-the-art approaches. The experimental results show that our proposed method outperforms the state-of-the-art baseline methods.

DGNN: Dependency Graph Neural Network for Multimodal Emotion Recognition in Conversation

ABSTRACT. In recent years, emotion recognition in conversation (ERC) has become a research hotspot in natural language processing due to its powerful applications. In ERC, the modeling of conversational dependency plays a crucial role. Existing methods often directly connect multimodal information and then build a graph neural network based on a fixed number of past and future utterances. The former leads to the lack of interaction between modalities, and the latter is less consistent with the logicality of the conversation. Therefore, we propose a Dependency Graph Neural Network (DGNN) for ERC. First, we present a cross-modal fusion transformer for modeling dependency between different modalities of the same utterance. Then, we design a directed graph neural network model based on the adaptive window for modeling dependency between utterances. The results of the extensive experiments on two benchmark datasets demonstrate the superiority of the proposed model.

All You See is the Tip of the Iceberg: Distilling Latent Interactions can Help You Find Treasures

ABSTRACT. Recommender systems suffer from data sparsity problem severely, which can be attributed to the combined action of various possible causes like: gradually strengthened privacy protection policies, exposure bias, etc. In these cases, the unobserved items do not always refer to the items that users are not interested in; they could also be imputed to the inaccessibility of interaction data or users' unawareness over items. Thus, blindly fitting all unobserved interactions as negative interactions in the training stage leads to the incomplete modeling of user preferences. In this work, we propose a novel training strategy to distill latent interactions for recommender systems (shorted as DLI). Latent interactions (i.e., False-negative interactions) refer to the possible links between users and items that can reflect user preferences but not happened. We first design a False-negative interaction selecting module to dynamically distill latent interactions along the training process. After that, we devise two types of loss paradigms: Truncated Loss and Reversed Loss. Truncated Loss can reduce the detrimental effect of False-negative interactions by discarding the False-negative samples in the loss computing stage, while Reversed Loss turning them into positive ones to enrich the interaction data. Meanwhile, both Truncated Loss and Reversed Loss can be further detailed into full mode and partial mode to discriminate different confidence levels of False-negative interactions. Extensive experiments on three benchmark datasets demonstrate the effectiveness of DLI in improving the recommendation performance of backbone models.

Detecting Adversarial Examples Via Classification Difference of a Robust Surrogate Model

ABSTRACT. Existing supervised learning detectors may deteriorate their performance when detecting unseen adversarial examples (AEs), because they may be sensitive with training samples. We found that (1) the CNN classifier is modest robust against AEs generated from other CNNs, and (2) such adversarial robustness is rarely affected by unseen instances. So, we construct an attack-agnostic detector based on an adversarial robust surrogate CNN to detect unknown AEs. Specifically, for a protected CNN classifier, we design a surrogate CNN classifier and predict the image with different classification labels on them as an AE. In order to detect transferable AEs and maintain low false positive rate, the surrogate model is distilled from the protected model, aiming at enhancing the adversarial robustness (i.e., suppress the transferability of AE) and meanwhile mimicking the output of clean image. To defend the potential ensemble attack targeted at our detector, we propose a new adversarial training scheme to enhance the security of the proposed detector. Experimental results of generalization ability tests on Cifar-10 and ImageNet-20 show that our method can detect unseen AEs effectively and performs much better than the state-of-the-arts.

Fooling Downstream Classifiers via Attacking Contrastive Learning Pre-trained Models

ABSTRACT. Nowadays, downloading a pre-trained contrastive learning (CL) encoder for feature extraction has become an emerging trend in computer vision tasks. However, few works pay attention to the security of downstream tasks when the upstream CL encoder is attacked by adversarial exam-ples. In this paper, we propose an adversarial attack against a pre-trained CL encoder, aiming to fool the downstream classification tasks under black-box cases. To this end, we design a feature similarity loss func-tion and optimize it to enlarge the feature difference between clean im-ages and adversarial examples. Since the adversarial example forces the CL encoder to output distorted features at the last layer, it successfully fools the downstream classifiers which are heavily relied on the encod-er’s feature output. Experimental results on three typical pre-trained CL models and three downstream classifiers show that our attack has achieved much higher attack success rates than the state-of-the-arts, es-pecially when attacking the linear classifier.

Neuron Attribution-Based Attacks Fooling Object Detectors

ABSTRACT. In this work, we propose a neural attribution-based attack (NAA) to improve the transferability of adversarial examples, aiming at deceiving object detectors with different backbones or architectures. To measure the neuron attribution (importance) for a CNN layer of detector, we sum the classification scores of all positive proposal boxes to calculate the integrated attention (IA), then get the neuron attribution matrix via element-wise multiplying IA with the feature difference between the clean image be attacked and a black image. Considering that the summation may bias importance values of some neurons, a mask is designed to drop out some neurons. The proposed loss calculated from the rest of neurons is minimized to generated adversarial examples. Since our attack disturbs the upstream feature outputs, it effectively disorders the outputs of downstream tasks, such as box regression and classification, and finally fool the detector. Extensive experiments on PASCAL VOC and COCO dataset demonstrate that our method achieves better transferability compared to the state-of-the-arts.

Quantized SGD in Federated Learning: Communication, Optimization and Generalization

ABSTRACT. The surge of massive data and the increasing importance of data privacy preservation have sparked considerable interest in Federated Learning (FL) algorithms in the field of machine learning. FL has gained significant attention due to its impressive scalability properties and its ability to preserve data privacy. To improve the communication efficiency of FL, quantization techniques have emerged as commonly used approaches. However, the introduction of randomized quantization in FL can introduce additional variance, impacting the accuracy of the models. Furthermore, few studies in the existing literature have explored the impact of quantization on the generalization ability of FL algorithms. In this paper, we focus on examining the interplay among key factors in the widely used distributed Stochastic Gradient Descent (SGD) algorithm with quantization. Specifically, we investigate the relationship between quantization level, optimization error, and generalization performance. For convex objectives, our main results reveal several trade-offs between communication efficiency, optimization error, and generalization performance. In the case of non-convex objectives, our theoretical findings indicate that the quantization level has a more significant impact on the generalization ability compared to convex cases. Moreover, our derived generalization bounds for non-convex objectives suggest that early stopping may be necessary to ensure a certain level of generalization accuracy, even when the step size in SGD is very small. Finally, we conduct several numerical experiments utilizing logistic models and deep neural networks to validate and support our theoretical findings.

14:00-18:00 Session Poster Session 2: Poster Session
Chairs:
Robust Iterative Hard Thresholding Algorithm for Fault Tolerant RBF Network

ABSTRACT. n the construction of a radial basis function (RBF) network, there are three crucial issues. The first one is to select RBF nodes from training samples. Two additional important issues are how to address realization imperfections and mitigate the impact of outlier training samples. This paper considers that training data contain some outlier samples, and that in the implementation there is weight noise in the RBF weights. We formulate the construction of an RBF network as a constrained optimization problem, in which the objective function consists of two terms. The first term is designed to suppress the effect of outlier samples, while the second term handles the effect of weight noise. In our formulation, there is an ℓ0-norm constraint whose role is to select the training samples for constructing RBF nodes. We then develop the robust iterative hard thresholding algorithm (RIHT) to solve the optimization problem based on the projected gradient concept. We theoretically study the convergence properties of the R-IHT. We use several benchmark datasets to verify the effectiveness of the proposed algorithm. The performance of our algorithm is superior to a number of state-of-the-art methods.

Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech

ABSTRACT. In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the tar- get speaker. In the third stage, the converted data is combined with the linguistic features and durations from recordings in the target language, which are then used to train a single-speaker acoustic model. Finally, the last stage entails the training of a locale-independent vocoder. Our eval- uations show that the proposed paradigm outperforms state-of-the-art approaches which are based on training a large multilingual TTS model. In addition, our experiments demonstrate the robustness of our approach with different model architectures, languages, speakers and amounts of data. Moreover, our solution is especially beneficial in low-resource set- tings.

A health evaluation algorithm for edge nodes based on LSTM

ABSTRACT. The health state of edge layer nodes significantly affects the reliability of the calculation and the development of related applications. Edge layer nodes health assessment is adopted to forecast node state to arrange calculation and application reasonably. The existing multidimensional data evaluation algorithms have achieved good predictive performance. However, with a small scale of training data, those algorithms could easily encounter overfitting and poor robustness. Therefore, we propose an evaluation algorithm based on the multidimensional operation data of edge layer nodes in this study. In order to solve the problem above, we propose an improved Long Short Term Memory (LSTM) model to realize the evaluation. We add feature discretization and annealing processes to the model to reduce the risk of model overfitting. Compared with typical time series prediction models, the proposed LSTM model has stronger applicability and better accuracy in the evaluation of network delay of edge layer nodes in our experiment.

Decision Support System based on MLP: Formula One (F1) Grand Prix Study Case

ABSTRACT. Neural networks are widely used due to the adaptability of models to many problems and high efficiency. These solutions are also gaining popularity in the design of Decision Support Systems. It leads to increased use of such techniques to support the decision-maker in practical problems. In this paper, we propose an Artificial Neural Network Decision Support System (ANN-DSS) based on Multilayer Perceptron. The model structure was determined by searching the optimal hyperparameters with Tree-structured Parzen Estimator. Based on the qualification results, the proposed system was directed to evaluate the Formula 1 divers’ best lap time during the race. Obtained rankings were compared with reference rankings using the WS rank similarity. Model performance proves to be highly consistent in rankings predictions, which makes it a reliable tool for the given problem.

Graph Reinforcement Learning For Securing Critical Loads By E-mobility

ABSTRACT. Inefficient scheduling of electric vehicles (EVs) is detrimental to not only the profitability of charging stations but also the experience of EV users and the stable operation of the grid. Regulating the charging market by dynamic pricing is a feasible choice for EV coordinated scheduling. Power outages caused by natural disasters have always been a serious threat to critical loads such as hospitals and data centers. With the development of vehicle-to-grid (V2G) technology, the potential to attract EV users to the stations near the critical loads through dynamic pricing and to aggregate EVs into a flexible emergency supply to maintain the critical load is being explored. However, determining charging prices in real-time that are both attractive to users and profitable to stations is a challenging task, which is further complicated by the relationships and interactions between multiple stations. Therefore, this paper proposes the graph reinforcement learning (GRL) approach to seek the optimal pricing strategy to address the above problems. The experiment results show that the proposed method can effectively achieve profit maximization and EV scheduling for critical load maintenance.

EdgeMA: Model Adaptation System for Real-Time Video Analytics on Edge Devices

ABSTRACT. Real-time video analytics on edge devices for changing scenes remains a difficult task. As edge devices are usually resource-constrained, edge deep neural networks (DNNs) have fewer weights and shallower architectures than general DNNs. As a result, they only perform well in limited scenarios and are sensitive to data drift. In this paper, we introduce EdgeMA, a practical and efficient video analytics system designed to adapt models to shifts in real-world video streams over time, addressing the data drift problem. EdgeMA extracts the gray level co-occurrence matrix based statistical texture feature and uses the Random Forest classifier to detect the domain shift. Moreover, we have incorporated a method of model adaptation based on importance weighting, specifically designed to update models to cope with the label distribution shift. Through rigorous evaluation of EdgeMA on a real-world dataset, our results illustrate that EdgeMA significantly improves inference accuracy.

PEVLR: A New Privacy-preserving and Efficient Approach for Vertical Logistic Regression

ABSTRACT. In our paper, we consider logistic regression in vertical federated learning. A new algorithm called PEVLR (Privacy-preserving and Efficient Vertical Logistic Regression) is proposed to efficiently solve vertical logistic regression with privacy preservation. To enhance the communication and computational efficiency, we design a novel local- update and global-update scheme for party A and party B, respectively. For the local update, we utilize hybrid SGD rather than vanilla SGD to mitigate the variance resulted from stochastic gradients. For the global update, full gradient is adopted to update the parameter of party B, which leads to faster convergence rate and fewer communication rounds. Furthermore, we design a simple but efficient plan to exchange intermediate information with privacy-preserving guarantee. Specifically, random matrix sketch and random selected permutations are utilized to ensure the security of original data, label information and parameters under honest-but-curious assumption. The experiment results show the advantages of PEVLR in terms of convergence rate, accuracy and efficiency, compared with other related models.

ESTNet: Efficient Spatio-Temporal Network for Industrial Smoke Detection

ABSTRACT. Computer vision has emerged as a cost-effective and convenient solution for identifying hazardous smoke emissions in industrial settings. However, in practical scenarios, the performance of existing methods can be affected by complex smoke characteristics and fluctuating environmental factors. To address these challenges, we propose a novel detection model called ESTNet. ESTNet utilizes both smoke texture features and unique motion features to enhance smoke detection. The Shallow Feature Enhancement Module (SFE) specifically enhances the learning of smoke texture features. The Spatio-temporal Feature Learning Module (SFL) effectively differentiates smoke from other interfering factors, enabling the establishment of smoke spatio-temporal feature learning. Notably, this module can be easily integrated into existing 2D CNNs, making it a versatile plug-and-play component. Furthermore, to improve the representation of the video, we employ Multi-Temporal Spans Fusion (MTSF) to incorporate information from multiple frames. This fusion technique allows us to obtain a comprehensive feature representation of the entire video. Extensive experiments and visualizations are conducted, demonstrating the effectiveness of our proposed method with state-of-the-art competitors.

Mastering Complex Coordination through Attention-based Dynamic Graph

ABSTRACT. The coordination between agents in multi-agent systems has become a popular topic in many fields. To catch the inner relationship between agents, the graph structure is combined with existing methods and improves the results. But in large-scale tasks with numerous agents, an overly complex graph would lead to a boost in computational cost and a decline in performance. Here we present DAGMIX, a novel graph-based value factorization method. Instead of a complete graph, DAGMIX generates a dynamic graph at each time step during training, on which it realizes a more interpretable and effective combining process through the attention mechanism. Experiments show that DAGMIX significantly outperforms previous SOTA methods in large-scale scenarios, as well as achieving promising results on other tasks.

Algorithm for Generating Tire Defect Images Based on RS-GAN

ABSTRACT. Aiming at the problems of poor image quality, unstable training process and slow convergence speed in the data expansion method for generating countermeasures network, this paper proposes a RS-GAN tire defect image generative model. Compared with traditional generative adversarial networks, RS-GAN integrates residual networks and attention mechanisms into an RSNet embedded in the adversarial network structure to improve the model's feature extraction ability; At the same time, the loss function JS divergence of the traditional generation countermeasure network is replaced by Wasserstein distance with gradient penalty term to improve the stability of model training. The experimental results show that the FID value of tire defect images generated by the RS-GAN model can reach 116.28, which is superior to the images generated by DCGAN, WGAN, CGAN, and SAGAN. Moreover, it has achieved more competitive results on SSIM and PSNR. The RS-GAN model can stably generate high-quality tire defect images, providing an effective way to expand the tire defect dataset and alleviating the small sample problem faced by the development of deep learning in the field of defect detection.

Novel-Registrable Weights and Region-Level Contrastive Learning for Incremental Few-Shot Object Detection

ABSTRACT. Few-shot object detection (FSOD) methods learn to detect novel objects from a few data, which also requires reusing base class data if detecting base objects is necessary. However, in some real applications, it is difficult to obtain old class data due to privacy or limited storage capacity, causing catastrophic forgetting when learning new classes. Therefore, incremental few-shot object detection (iFSOD) has attracted the attention of researchers in recent years. The iFSOD methods continuously learn novel classes and not forget learned knowledge without storing old class data. In this paper, we propose a novel method using novel-registrable weights and region-level contrastive learning (NWRC) for iFSOD. First, we use novel-registrable weights for RoI classification, which memorizes class-specific weights to alleviate forgetting old knowledge and registers new weights for novel classes. Then we propose region-level contrastive learning in the base training stage by proposal box augmentation, enhancing the generalizability of the feature representations and plasticity of the detector. We verify the effectiveness of our method on two experimental settings of iFSOD on COCO and VOC datasets. The results show that our method has the ability to learn novel classes with a few-shot dataset and not forget old classes.

SORA: Improving Multi-agent Cooperation with a Soft Role Assignment Mechanism

ABSTRACT. Role-based multi-agent reinforcement learning (MARL) holds the promise of achieving scalable multi-agent cooperation by decomposing complex tasks through the concept of roles and has enjoyed great success in various tasks. However, conventional role-based MARL methods typically assign a single role to each agent, limiting the agent's behavior in certain scenarios. In real life, an individual usually performs multiple responsibilities in a given task. To meet such situations, we propose a novel soft role assignment~(SORA) process that enables an agent to play multiple roles simultaneously. Concretely, SORA first generates a role distribution via the attention mechanism to interpret the agent's identity as a combination of different roles. To ensure consistent behavior with an agent's assigned role, we also introduce role-specific Q networks for decision-making. By virtue of these advances, our proposed method makes a prominent improvement over the prior state-of-the-art approaches on StarCraft multi-agent challenges and Google Research Football.

Semantic-pixel Associative Information improving Loop Closure Detection and Experience Map Building for Efficient Visual Representation

ABSTRACT. RatSLAM is a brain-inspired simultaneous localization and mapping (SLAM) system based on the rodent hippocampus model, which is used to construct the experience map for environments. However, the map it constructs has the problems of low mapping accuracy and poor adaptability to changing lighting environments due to the simple visual processing method. In this paper, we present a novel RatSLAM system by using more complex semantic object information for loop closure detection (LCD) and experience map building, inspired by the effectiveness of semantic information for scene recognition in the biological brain. Specifically, we calculate the similarity between current and previous scenes in LCD based on the pixel information computed by the sum of absolute differences (SAD) and the semantic information extracted by the YOLOv2 network. Then we build an enhanced experience map with object-level information, where the 3D model segmentation technology is used to perform instance semantic segmentation on the recognized objects. By fusing complex semantic information in visual representation, the proposed model can successfully mitigate the impact of illumination and fully express the multi-dimensional information in the environment. Experimental results on the Oxford New College, City Center, and Lab datasets demonstrate its superior LCD accuracy and mapping performance, especially for environments with changing illumination.

Hybrid U-Net: Instrument Semantic Segmentation in RMIS

ABSTRACT. Accurate semantic segmentation of surgical instruments from images captured by the laparoscopic system plays a crucial role in ensuring the reliability of vision-based Robot-Assisted Minimally Invasive Surgery. Despite numerous notable advancements in semantic segmentation, the achieved segmentation accuracy still falls short of meeting the requirements for surgical safety. To enhance the accuracy further, we propose several modifications to a conventional medical image segmentation network, including a modified Feature Pyramid Module. Within this modified module, Patch-Embedding with varying rates and Self-Attention Blocks are employed to mitigate the loss of feature information while simultaneously expanding the receptive field. As for the network architecture, all feature maps extracted by the encoder are seamlessly integrated into the proposed modified Feature Pyramid Module via element-wise connections. The resulting output from this module is then transmitted to the decoder blocks at each stage. Considering these hybrid properties, the proposed method is called Hybrid U-Net. Subsequently, multiple experiments were conducted on two available medical datasets and the experimental results reveal that our proposed method outperforms the recent methods in terms of accuracy on both medical datasets.

Human-Object Interaction Detection with Channel Aware Attention

ABSTRACT. Human-object interaction detection (HOI) is a fundamental task in computer vision, which requires locating instances and predicting their interactions. To tackle HOI, we attempt to capture the global context information in HOI scenes by explicitly encoding the global features using our novel channel aware attention mechanism. Our observation is that the context of an image, including people, objects and background plays important roles in HOI prediction. To leverage such information, we propose a channel aware attention, which applies global average pooling on the features to learn their channel-wise inter-dependency. Based on the channel aware attention, we develop a channel aware module and a channel aware encoder. Handling features in channel dimensions makes it convenient to encode the global features as well as to learn semantic features. We conduct experiments on two popular HOI datasets, including VCOCO and HICO-DET, on which our model outperforms the baseline by clear margins. The visual analysis demonstrates that our method is able to capture abundant interaction-related features by attending to relevant regions.

Continual Domain Adaption for Neural Machine Translation

ABSTRACT. Domain Neural Machine Translation (NMT) with small datasets requires continual learning to incorporate new knowledge, as catastrophic forgetting is the main challenge that causes the model to forget old knowledge during fine-tuning. Additionally, most studies ignore the multi-stage domain adaptation of NMT. To address these issues, we propose a multi-stage incremental framework for domain NMT based on knowledge distillation. We also analyze how the supervised signals of the golden label and the teacher model work within a stage. Results show that the teacher model can only benefit the student model in the early epochs, while harms it in the later epochs. To solve this problem, we propose using two training objectives to encourage the early and later training. For early epochs, conventional continual learning is retained to fully leverage the teacher model and integrate old knowledge. For the later epochs, the bidirectional marginal loss is used to get rid of the negative impact of the teacher model. The experiments show that our method outperforms multiple continual learning methods, with an average improvement of 1.11 and 1.06 on two domain translation tasks.

Neural-Symbolic Reasoning with External Knowledge for Machine Reading Comprehension

ABSTRACT. Machine reading comprehension is a fundamental in natural language understanding. Existing large-scale pre-trained language models and graph neural network-based models have achieved good gains on logical reasoning of text. However, neither of them can give a complete reasoning chain, while symbolic logic-based reasoning is explicit and explainable. Therefore, we propose a framework LoGEK that integrates symbolic logic and graph neural networks for reasoning, while leveraging external knowledge to augment the logical graph. The LoGEK model consists of three parts: logic extraction and extension, logical graph reasoning and answer prediction. Specifically, LoGEK extracts and extends logic set from the unstructured text. Then the logical graph reasoning module uses external knowledge to extend the original logical graph. After that, the model uses a path-based relational graph neural network to model the extended logical graph. Finally, the prediction module performs answer prediction based on graph embeddings and text embeddings. We conduct experiments on benchmark datasets for logical reasoning to evaluate the performance of LoGEK. The experimental results show that the accuracy of the method in this paper is better than the baseline models, which verifies the effectiveness of the method.

Modeling Both Collaborative and Temporal Information for Sequential Recommendation

ABSTRACT. Sequential recommendation has drawn a lot of attention due to the good performance it has shown in recent years. The temporal order of user interactions cannot be ignored in sequential recommendation, and user preferences are constantly changing over time. The application of deep neural network in sequential recommendation has achieved many remarkable results, especially self-attention based methods. However, previous works mainly focus on item-item temporal information of the sequence while ignoring the latent collaborative relations in user-item interactions. Therefore, we propose a new method named Collaborative-Temporal modeling for Sequential Recommendation (CTSR) to learn both collaborative relations and temporal information. The proposed CTSR method introduces a graph convolutional network to learn the user-item collaborative relations while using self-attention to capture item-item temporal information. We apply focal loss to reduce the loss contribution of the easy samples and increase the contribution of the hard samples, to train the model more effectively. We extract a portion of item-item pairs that are most valuable, and then feed these pairs as augmented information into adjacency matrix of the graph neural network. More importantly, it is the first work to encode the relative positions of items into their embeddings in sequential recommendation. The experimental results show that CTSR surpasses previous works on three real-world datasets.

Knowledge Distillation via Information Matching

ABSTRACT. Knowledge distillation can enhance network generalization by guiding a smaller student network to learn from a more complex teacher network. The challenge lies in maximizing the performance of the student network under the supervision of the teacher network. Currently, the feature-based distillation approach utilizes the middle-layer features of the teacher network to improve the performance of the student network. However, this approach lacks a measure to evaluate the content of the information present in the intermediate layers of both the teacher and student networks, which leads to a distillation mismatch of features and damages the student's performance. In this study, we propose a new feature distillation method to solve this problem. We measure the information content in the intermediate layers of the teacher and student networks based on the receptive fields of corresponding features. Subsequently, the suitable number and locations of transmission features are decided based on information content, effectively alleviating the risk of information mismatch during distillation. Our experimental results demonstrate that the proposed method significantly improves the performance of the student network.

Multi-level Attention Network with Weather Suppression for All-weather Action Detection in UAV Rescue Scenarios

ABSTRACT. UAVs have a more significant advantage over traditional surveillance cameras in terms of mobility and range. Human action detection from UAV images can help in fields such as search and rescue. Meanwhile, UAV images face the challenges of height, angle, and small objects; they can also be affected by illumination and weather in harsh conditions. In this paper, we propose a Multi-level Attention network with Weather Suppression for all-weather action detection in UAV rescue scenarios. The Weather Suppression module helps reduce the effects of illumination and weather, and the Multi-level Attention module helps improve the model's performance detecting small objects. We performed the detection task under normal and synthetic harsh conditions, and the results show that our model achieves state-of-the-art. The comparison of the relevant metrics also illustrates that our model has a relatively balanced size and complexity that facilitates deployment on UAV platforms; the ablation experiments demonstrate the contribution of modules.

A Stochastic Gradient-based Projection Algorithm for Distributed Constrained Optimization

ABSTRACT. This paper investigates a category of constrained convex optimization problems, where the collective objective function is represented as the sum of all local objective functions subjected to local bounds and equality constraints. This kind of problems can be formulated form a variety of applications, such as power control, sensor networks and source localization. To solve this problem more reliable and effective, we propose a novel distributed stochastic gradient-based projection algorithm under the presence of noisy gradients, where the gradients are infiltrated by arbitrary but uniformly bounded noise sample through local gradient observation. The proposed algorithm allows the adoption of constant step-size, which guarantees it can possess faster convergence rate compared with existing distributed algorithms with diminishing step-size. The effectiveness of the proposed algorithm is verified and testified by simulation experiments.

AAKD-Net:Attention-based Adversarial Knowledge Distillation Network for Image Classification

ABSTRACT. Deep neural networks have achieved remarkable success in various research fields, but they face limitations. Firstly, complex models are often required to handle challenging scenarios. Secondly, limited storage and processing power on mobile devices hinder model training and deployment. To address these challenges, we propose a novel approach using a compact and efficient student model to learn from a larger teacher model. To enhance feature map information extraction, we introduce an attention structure that leverages the rich features in the teacher model's feature maps. Adversarial training is incorporated by treating the student model as a generator and employing a discriminator to differentiate between teacher and student feature maps. Through an iterative process, the student model's feature map gradually approximates that of the teacher while improving the discriminator's discrimination abilities. By leveraging the knowledge of the teacher model and incorporating attention mechanisms and adversarial training, our approach provides a compelling solution to the challenges of complex model architectures and limited hardware resources. It achieves impressive performance enhancements with the student model.

FalconNet: Factorization for the Light-weight ConvNets

ABSTRACT. Designing light-weight CNN models with little parameters and Flops is a prominent research concern. However, three significant issues persist in the current light-weight CNNs: i) the lack of architectural consistency leads to redundancy and hindered capacity comparison, as well as the ambiguity in causation between architectural choices and performance enhancement; ii) the utilization of a single-branch depth-wise convolution compromises the model representational capacity; iii) the depth-wise convolutions account for large proportions of parameters and Flops, while lacking efficient method to make them light-weight. To address these issues, we factorize the four vital components of light-weight CNNs from coarse to fine and redesign them: i) we design a light-weight overall architecture termed LightNet, which obtains better performance by simply implementing the basic blocks of other light-weight CNNs; ii) we abstract a Meta Light Block, which consists of spatial operator and channel operator and uniformly describes current basic blocks; iii) we raise RepSO which constructs multiple spatial operator branches to enhance the representational ability; iv) we raise the concept of receptive range, guided by which we raise RefCO to sparsely factorize the channel operator. Based on above four vital components, we raise a novel light-weight CNN model termed as FalconNet. Experimental results validate that FalconNet can achieve higher accuracy with lower number of parameters and Flops compared to existing light-weight CNNs.

Abstractive Multi-document Summarization with Cross-Documents Discourse Relations

ABSTRACT. Generating a summary from a set of documents remains a challenging task. Abstractive multi-document summarization (MDS) methods have shown remarkable advantages when compared with extractive MDS. They can express the original document information in new sentences with higher continuity and readability. However, mainstream abstractive models, which are pre-trained on sentence pairs rather than entire documents, often fail to effectively capture long-range dependencies throughout the document. To address these issues, we propose a novel abstractive MDS model that aims to succinctly inject semantic and structural information of elementary discourse units into the model to improve its generative ability. In particular, we first extract semantic features by splitting the single document into discourses and building the discourse tree. Then, we design discourse Patterns to convert the raw document text and trees into a linearized format while guaranteeing corresponding relationships. Finally, we employ an abstractive model to generate target summaries with the processed input sequence and to learn the discourse semantic information. Extensive experiments show that our model outperforms current mainstream MDS methods in the ROUGE evaluation. This indicates the superiority of our proposed model and the capacity of the abstractive model with the hybrid pattern.

CenAD: Collaborative Embedding Network for Anomaly Detection with Leveraging Partially Observed Anomalies

ABSTRACT. Leveraging observed anomalies in anomaly detection can significantly improve detection accuracy. Assuming that observed anomalies cover all anomaly distributions, existing methods commonly learn the anomaly distributions from these observed anomalies and assign each object an anomaly score according to the similarities between it and observed anomalies. However, these observed anomalies may partially cover anomaly distributions, which severely restrains the performance in detecting uncovered anomalies. To address this issue, we propose a novel collaborative embedding network for this task, named CenAD. By leveraging partially observed anomalies, the collaborative learning derives a loss with maximum neighbor dispersion and minimum volume estimation as guidance to make anomalies more dispersed. Each object is assigned to an anomaly score by its contributions to data dispersion, which distinguishes these anomalies from the entire data effectively. To investigate the effectiveness of CenAD with partially observed anomalies, we conduct extensive results on several datasets to validate the superiority of our method, in which we obtain average improvement up to 13.92% in AUC-ROC and 29.44% in AUC-PR compared with previous methods.

Paper Recommendation with Multi-View Knowledge-aware Attentive Network

ABSTRACT. The paper recommendation system aims to recommend potential papers of interest to users from massive data. Many efforts introduce knowledge graphs to solve problems such as data sparsity faced by traditional recom-mendation methods and use GNN-based techniques to mine the features of users and papers. However, existing work has not emphasized the quality of the knowledge graph construction, and has not optimized the modeling method from the scenario of paper recommendation, which makes the quali-ty of recommendation results have room for improvement. In this paper, we proposed a Multi-View Knowledge-aware Attentive Network (MVKAN). Specifically, we first designed a knowledge graph construction method based on keynote extraction for better recommendation assistance. We then de-signed mechanisms for aggregation and propagation of graph attention from three views: the connectivity importance of entities, user’s time preferences, and short-cut links to users based on tag similarity. This helps to model the representation of users and papers more effectively. Results from the exper-iments show that our model outperforms the baselines.

Biological Tissue Sections Instance Segmentation based on Active Learning

ABSTRACT. Precisely identifying the locations of biological tissue sections on the wafer is the basis for microscopy imaging. However, the sections made of different biological tissues are different in shape. Therefore, the instance segmentation network trained in the existing dataset may not be suitable for detecting new sections, and the cost of making the new dataset is high. Therefore, this paper proposes an active learning algorithm for biological tissue section instance segmentation. The algorithm can achieve better results with only a few images for training when facing the new segmentation task of biological tissue sections. The algorithm adds a loss prediction module on the instance segmentation network, weights the uncertainty of the instance segmentation mask by the posterior category probability, and finally calculates the value of the sample. Then, we select the sample with the most significant value as the training set, so we can only label a small number of samples, and the network can achieve the expected performance. The algorithm is robust to different shapes of tissue sections and can be applied to various complex scenes to segment tissue sections automatically. Furthermore, experiments show that only labeling 30% samples of the whole training set makes the network achieve the expected performance.

A High-Performance Tensorial Evolutionary Computation for Solving Spatial Optimization Problems

ABSTRACT. As a newly emerged evolutionary algorithm, tensorial evolution (TE) has shown promising performance in solving spatial optimization problems owing to its tensorial representation and tensorial evolutionary patterns. TE algorithm sequentially performed different tensorial evolutionary operations on a single individual or pairs of individuals in a population during iterations. Since tensor algebra considers all dimensions of data simultaneously, TE was explicitly parallel in dimension level. However, it was burdened with intensive tensor calculations especially when encountering large-scale problems. How to extend TE to efficiently solve large-scale problems is one of the most pressing issues currently. Toward this goal, we first devise an efficient TE (ETE) algorithm which expresses all the evolutionary processes in a unified tensorial computational model. Compared to TE, the tensorial evolutionary operations are directly executed on a population rather than sole individuals, enabling ETE to achieve explicit parallel in both dimension and individual levels. To further enhance the computational efficiency of ETE, we leverage the compute unified device architecture (CUDA), which provides access to computational resources on graphics processing units (GPUs). A CUDA-based implementation of ETE (Cu-ETE) is then presented that utilizes GPU to accelerate tensorial evolutionary computation. Notably, Cu-ETE is the first implementation of tensorial evolution on GPU. Experimental results demonstrate the enhanced computational efficiency of both ETE (CPU) and Cu-ETE (GPU) over TE (CPU). By harnessing the power of tensorial algebra and GPU acceleration, Cu-ETE opens up new possibilities for efficient problem-solving in more complex and large-scale problems across various fields of knowledge.

Towards better evaluations of class activation mapping and interpretability of CNNs

ABSTRACT. As deep learning has been widely used in real life, there is an increasing de-mand for its transparency and its interpretability has received much attention from all walks of life. Current efforts in this field includes post-hoc visualiza-tion technique and intrinsically interpretable framework. However, there are still shortcomings in both of these techniques. In the post-hoc visualization techniques, the metric evaluating CAM method suffers from the ambiguity of evaluation object. We proposed a pair of quantitative evaluation metrics based on threshold cropping. The explanation maps were obtained by threshold crop-ping in order to minimize the variation of input images, thus making the eval-uation object more focused on the CAM method itself. Experimental results show that these metrics can evaluate the accuracy and intensity of various CAM methods comprehensively, and get the conclusion that the gradient-based CAM methods are more stable, and Score-CAM is susceptible to model. Meanwhile, most of the existing intrinsically interpretable frameworks tend to enhance the interpretability of models by mapping their intermediate results to concepts that can be understood by humans. However, the intermediate results of the model often contain a variety of information that hinders its correspond-ence with single concept, which is manifested as a many-to-many relationship between filters and classes in convolutional classification networks. To address this situation, we proposed an interpretable training framework based on mutu-al information neural maximization to alleviate filter-class entanglement. MIS metric, classification confusion matrix and adversarial attack experiments all confirmed the validity of this method.

Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting

ABSTRACT. Most of the existing crowd counting methods are based on convolutional neural networks (CNN) to solve the crowd scale and background noise problems. These methods can effectively extract local features, but their convolutional kernel sizes are limited so that it is hard to obtain global information which is also crucial for scale awareness and noise discrimination. In this paper, we propose a Multiscale Network with Equivalent Large Kernel Attention for Crowd Counting (MELANet), which can extract both global and local information based on CNN. MELANet is composed of three parts: feature extraction module (FEM) for original feature extraction, multiscale equivalent attention module (MEAM) for global and local information combination, and fusion module (FM) for multiscale feature fusion. In MEAM, by decomposing large convolution kernels into equivalent combinations of small convolution kernels, the model obtains receptive fields equivalent to the large convolutional kernels with lower complexity and less parameters. It enables local and global correlation in the attention mechanism based on CNN, which makes the model focus more on the crowd head region to resist the background noise. Besides, we use a multiscale structure and different convolution kernel sizes to encode contextual information at different scales into the feature maps to deal with head scale transformations. Furthermore, we add gate channel attention units in MEAM to enhance the channel adaptivity of the model. Extensive experiments demonstrate that MELANet can achieve excellent counting performance on three popular crowd counting datasets.

M$^3$FGM:A Node Masking and Multi-granularity Message passing-based Federated Graph Model for Spatial-Temporal Data Prediction

ABSTRACT. Researchers are solving the challenges of spatial-temporal prediction by combining Federated Learning (FL) and graph models with respect to the constrain of privacy and security. In order to make better use of the power of graph model, some researchs also combine split learning(SL). However, there are still several issues left unattended: 1) Clients might not be able to access the server during inference phase; 2) The graph of clients designed manually in the server model may not reveal the proper relationship between clients. This paper proposes a new GNN-oriented split federated learning method, named node {\bfseries M}asking and {\bfseries M}ulti-granularity {\bfseries M}essage passing-based Federated Graph Model (M$^3$FGM) for the above issues. For the first issue, the server model of M$^3$FGM employs a MaskNode layer to simulate the case of clients being offline. We also redesign the decoder of the client model using a dual-sub-decoders structure so that each client model can use its local data to predict independently when offline. As for the second issue, a new GNN layer named Multi-Granularity Message Passing (MGMP) layer enables each client node to perceive global and local information. We conducted extensive experiments in two different scenarios on two real traffic datasets. Results show that M$^3$FGM outperforms the baselines and variant models, achieves the best results in both datasets and scenarios.

Contrastive Learning-Based Music Recommendation Model

ABSTRACT. In the rapidly evolving era of digital multimedia, the overwhelming rate of music publication poses a challenge for users seeking efficient access to their preferred songs. Music recommendation systems aim to address this issue but still encounter problems such as overfitting, the cold start problem for new users, and result bias. To tackle these challenges, we propose an optimized music recommendation model called Contrastive Learning for Music Recommendation (CLMR), leveraging advanced deep learning and contrastive learning techniques. CLMR leverages the bipartite graph information between users and songs and introduces a contrastive learning framework to enhance the representation of sparse data, thereby improving recommendation accuracy and mitigating data sparsity issues. To combat sampling bias, a comparative learning approach is employed within CLMR, utilizing Gaussian noise to construct more effective positive samples. This method enhances the model's learning capability and robustness in challenging environments. Experimental comparisons with traditional recommendation models based on content filtering, collaborative filtering, and supervised learning demonstrate that the proposed CLMR model outperforms them, achieving superior performance in terms of NDCG and Recall metrics.

LenANet: A Length-controllable Attention Network for Source Code Summarization

ABSTRACT. Source code summarization aims to generate brief summaries describe the functionality of a source code. Existing approaches have made great breakthroughs through encoder-decoder models. They focus on learning common features contained in translation from source code to natural language summaries. As a result, they tend to generate generic summaries independent of the context and lack of details. However, specific summaries which characterize specific features of code snippets are widely present in various datasets and real-world scenarios. Such summaries are rarely studied as capturing specific features of source code would be difficult. What’s more, only the common features learned would result in only the generic short summaries generated. In this paper, we present LenANet to generate specific summaries by considering the desired length information and extracting the specific code sentence. Firstly, we introduce length offset vector to force the generation of summaries which could contain specific amount of information, laying the groundwork for generating specific summaries. Further, forcing the model to generate summaries with a certain length would bring in invalid or generic descriptions, a context-aware code sentence extractor is proposed to extract specific features of the source code. Besides, we present a novel sentence-level code tree to capture the structural semantics and learn the representation of code sentence by graph attention network, which is crucial for specific features extraction. Experimental results demonstrate that LenANet significantly outperforms the state-of-the arts baselines on CodeXGLUE datasets with six programming language and has the potential to generate specific summaries. In particular, the overall BLUE-4 is improved by 0.53 on the basis of CodeT5 with length control.

Unsupervised Monocular Depth Estimation with Semantic Reconstruction using Dual-Discriminator Generative Adversarial Networks

ABSTRACT. Monocular depth estimation is a key issue in the field of computer vision. The unsupervised learning framework has the advantage of not requiring data labels, and has become a hot research topic. Currently, most methods use view synthesis as a supervisory signal, resulting in unclear edges and semantic distortion of predicted results in some situations. We proposed a new framework that introduce a semantic reconstruction loss to provide additional constraints for the network and improve the ability of the depth network to understand scenarios. In addition, we proposed a dual-discriminator adversarial training strategy to further strengthen semantic supervision and improve the accuracy of depth estimation. The test results show that our proposed method has achieved competitive performance on the KITTI dataset [1].

Using Less But Important Information for Feature Distillation

ABSTRACT. The purpose of feature distillation is that using the teacher network to supervise student network so that the student network can mimic the intermediate layer representation of the teacher network. The most intuitive way of feature distillation is to use the Mean-Square Error (MSE) to optimize the distance of feature representation at the same level for both networks. However, one problem in feature distillation is that the dimension of the intermediate layer feature maps of the student network may be different from that of the teacher network. Previous work mostly elaborated a projector to transform feature maps to the same dimension. In this paper, we proposed a simple and straightforward feature distillation method without additional projector to adapt the feature dimension inconsistency between the teacher and the student networks. We consider the redundancy of the data and show that it is not necessary to use all the information when performing feature distillation. In detail, we propose a cut-off operation for channel alignment and use singular value decomposition (SVD) for knowledge alignment so that only important information is transferred to the student network to solve the dimension inconsistency problem. Extensive experiments on several different models show that our method can improve the performance of student networks.

Efficient Mobile Robot Navigation Based on Federated Learning and Three-way Decisions

ABSTRACT. In the context of Industry 5.0, the importance of mobile robot navigation (MRN) cannot be emphasized enough, as it plays a pivotal part in fostering the partnership between machines and humans. To augment MRN capabilities, emerging technologies like federated learning (FL) are being utilized. FL enables the consolidation of knowledge from numerous robots located in diverse areas, enabling them to collectively learn and enhance their navigation skills. Through the integration of FL into MRN systems, Industry 5.0 can effectively harness the potential of collaborative intelligence to achieve efficient and high-quality production processes. When considering information representation in MRN, the adoption of picture fuzzy sets (PFSs), which expand upon the concept of intuitionistic fuzzy sets, offers significant advantages in effectively handling information inconsistencies in practical situations. Specifically, by leveraging the benefits of multi-granularity (MG) probabilistic rough sets (PRSs) and three-way decisions (3WD) within the FL framework, an efficient MRN approach based on FL and 3WD is thoroughly investigated. Initially, the adjustable MG picture fuzzy (PF) PRS model is developed by incorporating MG PRSs into the PF framework. Subsequently, the PF maximum deviation method is utilized to calculate various weights. In order to determine the optimal granularity of MG PF membership degrees, the CODAS (Combinative Distance based ASsesment) method is employed, known for its flexibility in handling both quantitative and qualitative attributes whereas effectively managing incomplete or inconsistent data with transparency and efficiency. Once the optimal granularity is established, the MRN method based on FL and 3WD is finally established. Finally, a realistic case study utilizing MRN data from the Kaggle database is performed to validate the feasibility of our method.

PBTR: Pre-training and Bidirectional Semantic Enhanced Trajectory Recovery

ABSTRACT. The advancement of position acquisition technology has en- abled the study based on vehicle trajectories. However, limitations in equipment and environmental factors often result in missing track records, significantly impacting the trajectory data quality. It is a fundamen- tal task to restore the missing vehicle tracks within the traffic network structure. Existing research has attempted to address this issue through the construction of neural network models. However, these methods ne- glect the significance of the bidirectional information of the trajectory and the embedded representation of the trajectory unit. In view of the above problems, we propose a Seq2Seq-based trajectory recovery model that effectively utilizes bidirectional information and generates embed- ded representations of trajectory units to enhance trajectory recovery performance, which is a Pre-Training and Bidirectional Semantic en- hanced Trajectory Recovery model, namely PBTR. Specifically, the road network’s representations extracting time factors are captured by a pre-training technique and a bidirectional semantics encoder is em- ployed to enhance the expressiveness of the model followed by an atten- tive recurrent network to reconstruct the trajectory. The efficacy of our model is demonstrated through its superior performance on two real- world datasets.

Event-aware Document-level Event Extraction via Multi-granularity Event Encoder

ABSTRACT. Event extraction (EE) is a crucial task in natural language processing that entails identifying and extracting events from unstructured text. However, the prior research has largely concentrated on sentencelevel event extraction (SEE), while disregarding the increasing requirements for document-level event extraction (DEE) in real-world scenarios. The latter presents two significant challenges, namely the arguments scattering problem and the multi-event problem, which are more frequently observed in documents. In this paper, we propose an event-aware document-level event extraction framework, which can accurately detect event locations throughout the entire document without triggers and encode information at three different granularities (i.e., event-level, document-level, and sentence-level) via a multi-granularity event encoder. The resulting event-related holistic representation is then utilized for subsequent event record generation, thereby improving the accuracy of argument classification. Our proposed model’s effectiveness is demonstrated through experimental results obtained from a large Chinese financial dataset.

GCM-FL: A Novel Granular Computing Model in Federated Learning for Fault Diagnosis

ABSTRACT. Industry 5.0, an intelligent manufacturing paradigm that emphasizes human-centric approaches and collaborative interaction between humans and machines, finds great value in federated learning (FL). FL proves to be an effective tool in Industry 5.0, particularly in the context of fault diagnosis (FD), as it enables the training of models on local devices and facilitates the sharing of updates to address data scarcity challenges. This approach ultimately enhances the accuracy and efficiency of the system. Moreover, when it comes to FD in the context of mine ventilator (MV), data privacy and security become crucial factors to consider. FL offers a solution by enabling distributed training without compromising the raw data, thus safeguarding data privacy and security. While FL effectively tackles the challenges of data scarcity and privacy protection during model training, it is essential to ensure proper management to mitigate potential risks to data privacy. Therefore, this research suggests the integration of FL with MVFD, resulting in the development of a triangular fuzzy information system (TFIS) and a three-way decision (TWD) model. The proposed methodology is constructed from an adjustable multi-granularity (MG) triangular fuzzy (TF) probabilistic rough sets (PRSs). To evaluate the practicality of the algorithm model, an MV data set is utilized, and the ELECTRE (Elimination Et Choice Translating Reality) method is employed for optimal selections. Additionally, a contrast analysis is conducted to validate the applicability of the model, leading to the establishment of a group decision-making scheme for MVFD by referring to the proposed methodology.

A deep joint model of Multi-Scale intent-slots Interaction with Second-Order Gate for SLU.

ABSTRACT. Slot filling and intent detection are crucial tasks of Spoken Language Understanding (SLU). However, most existing joint models establish shallow connections between intent and slot by sharing parameters, which cannot fully utilize their rich interaction information. Meanwhile, the character and word fusion methods used in the Chinese SLU simply combines the initial information without appropriate guidance, making it easy to introduce a large amount of noisy information. In this paper, we propose a deep joint model of Multi-Scale intent-slots Interaction with Second-Order Gate for Chinese SLU (MSIM-SOG). The model consists of two main modules: (1) the Multi-Scale intent-slots Interaction Module (MSIM), which enables cyclic updating the multi-scale information to achieve deep bi-directional interaction of intent and slots; (2) the Second-Order Gate Module (SOG), which controls the propagation of valuable information through the gate with second-order weights, reduces the noise information of fusion, accelerates model convergence, and alleviates model overfitting. Experiments on two public datasets demonstrate that our model outperforms the baseline and achieves state-of-the-art performance compared to previous models.

Visual Navigation of Target-Driven Memory-Augmented Reinforcement Learning

ABSTRACT. Compared to visual navigation methods based on reinforcement learning that rely on auxiliary information such as depth images, semantic segmentation, object detection, and relational graphs, methods that solely utilize RGB images do not require additional equipment and have better flexibility. However, these methods often suffer from underutilization of RGB image information, resulting in suboptimal model generalization performance. To address this limitation, we present the Target-Driven Memory-Augmented (TDMA) framework. This framework utilizes an external memory to store fused Target-Scene features obtained from the observed and target images. To capture and leverage long-term dependencies within this stored data, we employ the Transformer model to process historical information. Additionally, we introduce a self-attention sub-layer in the Decoder section of the Transformer to enhance the model's focus on similar regions between the observed and target images. Experimental evaluations conducted on the AI2-THOR dataset demonstrate that our proposed method achieves an 8% improvement in success rate and a 16% improvement in success weighted by path length compared to methods in the same experimental setup.

Instance-aware and Semantic-guided Prompt for Few-shot Learning in Large Language Models

ABSTRACT. The effectiveness of large language models (LLMs) and instruction learning has been demonstrated in different pre-trained language models (such as ChatGPT). However, current prompt learning methods usually use a unified template for the same tasks, and the template is difficult to capture significant information from different instances. To integrate the semantic attention dynamically on the instance level, We propose ISPrompt, an instance-semantic-aware prompt learning model. Specifically, the instance-driven prompt generated from the semantic dependency tree is introduced. Then, the proposed model would select a suitable semantic prompt from the prompt selection pool to motivate the prompt-based fine-tuning process. Our results show that the proposed model achieves state-of-the-art performance on few-shot learning tasks, which proves that ISPrompt integrating the instance semantics dynamically could assume as a better knowledge-mining tool for PLMs.

Graph Attention Network Knowledge Graph Completion Model Based on Relational Aggregation

ABSTRACT. Knowledge Graph Completion(KGC) refers to inferring missing links based on existing triples. Research has found that graph neural networks perform well in this task. By using the topology of the graph and the characteristics of the nodes to learn, the feature representation of the nodes can be updated, and more semantic information can be obtained from the surrounding entities and relationships. This paper aims to propose an end-to-end structured Graph Attention Network Enhanced Relationship Aggregation(GANERA) knowledge graph completion model. Firstly, entity aggregation is performed on the central entity by adding an entity attention mechanism. The addition of entity attention can distinguish the importance of different neighbor entities and screen more important entity embeddings. At the same time, relationship-specific parameters are introduced to enhance the expressive power of the message function, which also enables the model to extract relationship information in the parameter space. Finally, a ConvR convolutional network is used as a decoder. Through experiments, we have proved the effectiveness of the proposed model on standard FB15k-237 and WN18RR datasets, and achieved good results on other datasets, while also achieving relative improvements in Hits@N and MRR values compared to other models.

Landmark-assisted Facial Action Unit Detection with Optimal Attention and Contrastive Learning

ABSTRACT. In this paper, we propose a weakly-supervised algorithm for facial action unit (AU) detection in the wild, which combines basic facial features and attention-based landmark features as well as contrastive learning to improve the performance of AU detection. Firstly, the backbone is a weakly-supervised algorithm since AU datasets in the wild are scarce and the utilization of other public datasets can capture robust basic facial features and landmark features. Secondly, we explore and select the optimal attention-based landmark encoder to capture facial landmark features that have been shown highly related to AUs. Then, we combine basic facial features and attention-based landmark features for AU detection. Finally, we propose a weighted multi-label contrastive loss function for the further improvement of AU detection. Extensive experiments on RFAU and BP4D demonstrate that our method outperforms or is comparable with state-of-the-art weakly-supervised and supervised AU detection methods. We add the code in the supplementary files, which will be made publicly available later.

An Effective Morphological Analysis Framework of Intracranial Artery in 3D Digital Subtraction Angiography

ABSTRACT. Acquiring accurate anatomy information of intracranial artery from 3D digital subtraction angiography (3D-DSA) is crucial for intracranial artery intervention surgery. However, this task often comes with challenges of large-scale image and memory constraints. In this paper, an effective two-stage framework is proposed for fully automatic morphological analysis of intracranial artery. In the first stage, the proposed Region-Global Fusion Network (RGFNet) achieves accurate and continuous segmentation of intracranial artery. In the second stage, the 3D morphological analysis algorithm obtains the access diameter, the minimum inner diameter and the minimum radius of curvature of intracranial artery. RGFNet achieves state-of-the-art performance (93.36% in Dice, 87.83% in mIoU and 15.64 in HD95) in the 3D-DSA intracranial artery segmentation dataset, and the proposed morphological analysis algorithm also shows effectiveness in obtaining accurate anatomy information. The proposed framework is not only helpful for surgeons to plan the procedures of interventional surgery but also promising to be integrated to robotic navigation systems, enabling robotic-assisted surgery.

Scalable Bayesian Tensor Ring Factorization for Multiway Data Analysis

ABSTRACT. Tensor decompositions play a crucial role in numerous applications related to multi-way data analysis. By employing a Bayesian framework with sparsity-inducing priors, Bayesian Tensor Ring (BTR) factorization offers probabilistic estimates and an effective approach for automatically adapting the tensor ring rank during the learning process. However, previous BTR method employs an Automatic Relevance Determination (ARD) prior, which can lead to sub-optimal solutions. Besides, it solely focuses on continuous data, whereas many applications involve discrete data. More importantly, it relies on the Coordinate-Ascent Variational Inference (CAVI) algorithm, which is inadequate for handling large tensors with extensive observations. These limitations greatly limit its application scales and scopes, making it suitable only for small-scale problems, such as image/video completion. To address these issues, we propose a novel BTR model that incorporates a nonparametric Multiplicative Gamma Process (MGP) prior, known for its superior accuracy in identifying latent structures. To handle discrete data, we introduce the Polya-Gamma augmentation for closed-form updates. Furthermore, we develop an efficient Gibbs sampler for consistent posterior simulation, which reduces the computational complexity of previous VI algorithm by two orders, and an online EM algorithm that is scalable to extremely large tensors. To showcase the advantages of our model, we conduct extensive experiments on both simulation data and real-world applications.

Multi-Neuron Information Fusion for Direct Training Spiking Neural Networks

ABSTRACT. Spiking neural networks (SNNs) are currently receiving increasing research attention. Most existing SNNs utilize a single class of neuron models. These approaches fail to consider features such as diversity and connectivity of biological neurons, thus limiting their adaptability to different image datasets. Inspired by the gap junctions in neuroscience, we propose a multi-neuron information fusion (MIF) model. This model incorporates multiple neuron models, forming neuron groups that can reflect biological plausibility while aming improving experimental performance. We evaluate the proposed model on the MNIST, Fashion-MNIST, CIFAR10, and N-MNIST datasets, and the experimental results show that it can achieve competitive results with fewer time steps.

Multi-Scale Local Region-Based Facial Action Unit Detection with Graph Convolutional Network

ABSTRACT. Facial action unit (AU) detection is crucial for general facial expression analysis. Different AUs cause facial appearance changes over various regions at different scales, and may interact with each other. However, most existing methods fail to extract the multi-scale feature at local facial region, or consider the AU relationship in the classifiers. In this paper, we propose a novel multi-scale local region-based facial AU detection framework with Graph Convolutional Network (GCN). The proposed framework consists of two parts: multi-scale local region-based (MSLR) feature extraction and AU relationship modeling with GCN. Firstly, to extract the MSLR features, we build the improved AU centers for each AU, and then extract multi-scale feature around the centers with several predefined windows. Secondly, we employ the GCN framework to model the relationship between AUs. Specifically, we build the graph of AUs, then utilize two GCNs to update the MSLR feature and the AU classifiers respectively. Finally, the AU predicted probability is determined by both the multi-scale local feature and the relationship between AUs. Experimental results on two widely used AU detection datasets BP4D and DISFA show that the proposed algorithm outperforms the state-of-the-art methods.

Event-based Object Recognition Using Feature Fusion and Spiking Neural Networks

ABSTRACT. Event-based cameras have garnered growing interest in computer vision due to the advantages of sparsed spatio-temporal representation. Spiking neural network (SNN), as a representative brain-inspired computing model, is inherently suitable for event-driven processing. However, event-based SNN still has shortcomings in using multiple feature extraction methods, such as the loss of feature information. In this work, we propose an event-based hierarchical model using feature fusion and SNN for object recognition. In the proposed model, input event stream is adaptively sliced into segment stream for the subsequent feature extraction and SNN with Tempotron rule. And the model utilizes feature mapping to preserve the orientation features extracted by Gabor filter for the fusion of orientation feature and time-surface features considering the surrounding past events within the time window. The experiments conducted on several event-based datasets (i.e., N-MNIST, MNIST-DVS, DVS128Gesture and DailyAction-DVS) show superior performance of the proposed model and the ablation study demonstrates the effectiveness of feature fusion for object recognition.

CRE: An Efficient Ciphertext Retrieval Scheme based on Encoder

ABSTRACT. Searchable Encryption is utilized to address the issue of searching for outsourced encrypted data on third-party untrusted cloud servers. Traditional approaches for ciphertext retrieval are limited to basic keyword-matching queries and fall short when it comes to handling complex semantic queries. Although several semantic retrieval schemes have been proposed in recent years, their performance is inadequate. This paper introduces a semantic retrieval scheme called CRE (Ciphertext Retrieval based on Encoder), which leverages the prompt-based RoBERTa pre-trained language model to generate precise embeddings for sentences in queries and documents. Moreover, to improve retrieval speed in the face of massive high-dimensional sentence embedding vectors, we introduce the HNSW algorithm. Through experimentation and theoretical analysis, this paper demonstrates that CRE outperforms SSSW2 and SSRB2 in terms of retrieval speed and accuracy.

Circular FC: Fast Fourier Transform Meets Fully Connected Layer For Convolutional Neural Network

ABSTRACT. The fully connected (FC) layer is generally located behind the global pooling layer in the convolutional neural network (CNN). Its essence is the weighted summation of the features extracted from the previous convolutional layers, that is, feature remapping. However, the FC layer with close internal correlation inevitably brings parameter redundancy. In order to alleviate this problem, in this paper, we propose a novel lightweight FC-like module, dubbed as Circular FC, by constructing weight parameters in a circular manner. Inspired by digital signal processing theories, we implement Circular FC by fast Fourier transform (FFT) based on the circular convolution theorem of discrete signals. Circular FC is designed to be a plug-and-play classification head and can be easily embedded into existing CNNs such as VGG, Xception, DenseNet, and ResNets. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet datasets illustrate that the above networks equipped with Circular FC reduce the number of parameters while maintaining comparable image classification performance.

Accurate Latency Prediction of Deep Learning Model Inference under Dynamic Runtime Resource

ABSTRACT. Accurate prediction of inference time can effectively accelerate design and deployment of Deep Neural Network (DNN) models, and provide hints for process scheduling in intelligent systems. However, existing prediction methods do not deliberate the process resources allocation during model inference, result in a deviation from the true time. To this end, a hardware-aware prediction method based on double feature embedding (HADE) for deep learning model inference time is proposed. HADE builds atop two key techniques: (i) double feature embedding uses the switchable operator-wise Multilayer Perceptron and GNN-based graph embedding to encode node features extracted from model computation graph sequentially; and (ii) Resource Aware Latency (RAL) formula predicts the model inference time under arbitrary hard- ware resource allocations. The experiments show that the accuracy of standard inference time prediction can reach 96.4%, and of resource dependent inference time can reach 79.7%.

Sentiment Analysis Based on Pre-trained Language Models: Recent Progress

ABSTRACT. Pre-trained Language Models (PLMs) can be applied to downstream tasks with only fine-tuning, eliminating the need to train the model from scratch. In particular, PLMs have been utilised for Sentiment Analysis (SA), a process that detects, analyses, and extracts the polarity of text sentiments. To help researchers comprehensively under stand the existing research on PLM-based SA, identify gaps, establish context, acknowledge previous work, and learn from methodologies, we present a literature review on the topic in this paper. Specifically, we brief the motivation of each method, offer a concise overview of these methods, compare their pros, cons, and performance, and identify the challenges for future research.

Domain-Invariant Task Optimization for Cross-domain Recommendation

ABSTRACT. The challenge of cold start has long been a persistent issue in recommender systems. However, Cross-domain Recommendation (CDR) provides a promising solution by utilizing the abundant information available in the auxiliary source domain to facilitate cold-start recommendations for the target domain. Many existing popular CDR methods only use overlapping user data but ignore non-overlapping user data when training the model to establish a mapping function, which reduces the model's generalization ability. Furthermore, these CDR methods often directly learn the target embedding during training, because the target embedding itself may be unreasonable, resulting in an unreasonable transformed embedding, exacerbating the difficulty of model generalization. To address these issues, we propose a novel framework named Domain-Invariant Task Optimization for Cross-domain Recommendation (DITOCDR). To effectively utilize non-overlapping user information, we employ source and target domain autoencoders to learn overlapping and non-overlapping user embeddings and extract domain-invariant factors. Additionally, we use a task-optimized strategy for target embedding learning to optimize the embedding and implicitly transform the source domain user embedding to the target feature space. We evaluate our proposed DITOCDR on three real-world datasets collected by Amazon, and the experimental results demonstrate its excellent performance and effectiveness.

Adversarial Example Detection with Latent Representation Dynamic Prototype

ABSTRACT. In the realm of Deep Neural Networks (DNNs), one of the primary concerns is their vulnerability in adversarial environments, whereby malicious attackers can easily manipulate them. As such, identifying adversarial samples is crucial to safeguarding the security of DNNs in real-world scenarios. In this work, we propose a method of adversarial example detection. Our approach using a Latent Representation Dynamic Prototype to sample more generalizable latent representations from a learnable Gaussian distribution, which relaxes the detection dependency on the nearest neighbor's latent representation. Additionally, we introduce Random Homogeneous Sampling (RHS) to replace KNN sampling reference samples, resulting in lower reasoning time complexity at $O(1)$. Lastly, we use cross-attention in the adversarial discriminator to capture the evolutionary differences of latent representation in benign and adversarial samples by comparing the latent representations from inference and reference samples globally. We conducted experiments to evaluate our approach and found that it performs competitively in the gray-box setting against various attacks with two $\mathcal{L}_p$-norm constraints for CIFAR-10 and SVHN datasets. Moreover, our detector trained with PGD attack exhibited detection ability for unseen adversarial samples generated by other adversarial attacks with small perturbations, ensuring its generalization ability in different scenarios.

TRFN: Triple-Receptive-Field Network for Regional-Texture and Holistic-Structure Image Inpainting

ABSTRACT. Image inpainting is a challenging task and has become a hot issue in recent years. Despite the significant progress of modern methods,it is still difficult to fill arbitrary missing regions with both vivid textures and coherent structures. Because of the limited receptive fields, methods centered on convolution neural networks only deal with regular textures but lose holistic structures. To this end, we propose a Triple-Receptive-Field Network (TRFN) that fuses local convolution features, global attention mechanism, and frequency domain learning in this study. TRFN roots in a concurrent structure that enables different receptive fields and retains local features and global representations to the maximum extent.TRFN captures effective representations and generates simultaneously detailed textures and holistic structures by using the concurrent structure.Experiments demonstrate the efficacy of TRFN and the proposed method achieves outstanding performance over the competitors.

A Data-free Substitute Model Training Method for Textual Adversarial Attacks

ABSTRACT. BERT and other pre-trained language models are vulnerable to textual adversarial attacks. While current transfer-based textual adversarial attacks in black-box environments rely on real datasets to train substitute models, obtaining the datasets can be challenging for attackers. To address this issue, we propose a data-free substitute training method (DaST-T) for textual adversarial attacks, which can train substitute models without the assistance of real data. DaST-T consists of two major steps. Firstly, DaST-T creates a special Generative Adversarial Network (GAN) to train substitute models without any real data. The training procedure utilizes samples synthesized at random by the generative model, where labels are generated by the attacked model. In particular, DaST-T designs a data augmenter for the generative model to facilitate rapid exploration of the entire sample space, thereby accelerating the performance of substitute model training. Secondly, DaST-T applies existing white-box textual adversarial attack methods to the substitute model to generate adversarial text, which is then migrated to the attacked model. DaST-T can effectively address the issue of limited access to real datasets in black-box textual adversarial attacks. Experimental results on text classification tasks in NLP show that DaST-T can achieve superior attack performance compared to other baselines of black-box textual adversarial attacks while requiring fewer sample queries.

Towards Analyzing the Efficacy of Multi-task Learning in Hate Speech Detection

ABSTRACT. Secretary-General António Guterres launched the United Nations Strategy and Plan of Action on Hate Speech in 2019, recognizing the alarming trend of increasing hate speech worldwide. Despite extensive research, benchmark datasets for hate speech detection remain limited in volume and vary in domain and annotation. In this paper, the following research objectives are deliberated (a) performance comparisons between multi-task models against single-task models; (b) performance study of different multi-task models (fully shared, shared-private) for hate speech detection, considering individual dataset as a separate task; (c) what is the effect of using different combinations of available existing datasets in the performance of multi-task settings? A total of six datasets that contain offensive and hate speech on the accounts of race, sex, and religion are considered for the above study. Our analysis suggests that a proper combination of datasets in a multi-task setting can overcome data scarcity and develop a unified framework.

Intelligent trajectory tracking control of unmanned parafoil system based on SAC optimized LADRC

ABSTRACT. The unmanned parafoil system has become increasingly popular in a variety of military and civilian applications due to its remarkable carrying capacity, as well as its capacity to modify its flight path by adjusting the left or right paracord. To ensure the unmanned system's safe completion of its flight mission, precise trajectory tracking control of the parafoil is essential. This paper presents an intelligent trajectory tracking approach that employs a soft actor-critic (SAC) algorithm optimized linear active disturbance rejection control (LADRC). Using the eight-degree-of-freedom (DOF) parafoil model as a basis, we have developed a trajectory tracking guidance law to address the underactuated problem. To ensure that the system's yaw angle accurately tracks the guided yaw angle, we have designed a second-order LADRC. Additionally, SAC algorithm is used to obtain adaptive parameters for the controller, ultimately enhancing tracking performance. Simulation results show that the proposed method can overcome the wind disturbance and achieve the convergence of tracking errors.

Integrating Multi-View Feature Extraction and Fuzzy Rank-Based Ensemble for Accurate HIV-1 Protease Cleavage Site Prediction

ABSTRACT. Acquired immunodeficiency syndrome (AIDS) continues to be a significant cause of mortality, disability, and economic repercussions, especially in underdeveloped countries. Extensive research has been conducted to develop effective therapies for human immunodeficiency virus (HIV) infection, including the prediction of HIV-1 protease cleavage sites. Accurate prediction of these sites can expedite the discovery of new HIV-1 protease inhibitors. Motivated by this, we propose a novel approach for HIV-1 protease cleavage site prediction using numerical descriptors based on octapeptide sequences. Our method incorporates multi-view feature extraction, combining sequence order effects of amino acids with physicochemical features. To capture important information, we utilize a convolutional neural network for feature extraction. For the classification task, we employ a fuzzy rank-based ensemble method, utilizing Random Forest, Logistic Regression, and Support Vector Machine as base classifiers. The ensemble combines their predictions to make the final prediction. Experimental evaluation on benchmark datasets demonstrates the effectiveness of our approach, achieving average Accuracy, AUC, Precision, Recall, and F-measure of 0.93, 0.95, 0.85, 0.76, and 0.80, respectively. Comparisons with existing studies confirm the potential of our proposed technique.

Joint Regularization Knowledge Distillation

ABSTRACT. Knowledge distillation is devoted to increasing the similarity between a small student network and an advanced teacher network in order to improve the performance of the student network. However, these methods focus on teacher and student networks that receive supervision from each other independently and do not consider the network as a whole. In this paper, we propose a new knowledge distillation framework called Joint Regularization Knowledge Distillation (JRKD), which aims to reduce network differences through joint training. Specifically, we train teacher and student networks through joint regularization loss to maximize consistency between the two networks. Meanwhile, we develop a confidence-based continuous scheduler method (CBCS), which divides examples into center examples and edge examples based on the example confidence distribution of network output. Prediction differences between networks are reduced when training with a central example. Teacher and student networks will become more similar as a result of joint training. Extensive experimental results on benchmark datasets such as CIFAR10, CIFAR-100, and Tiny-ImagNet show that JRKD outperforms many advanced distillation methods.

Correlation Guided Multi-Teacher Knowledge Distillation

ABSTRACT. Knowledge distillation is a model compression technique that transfers knowledge from a redundant and strong network (teacher) to a lightweight network (student). Due to the limitations of a single teacher's perspective, researchers advocate for the inclusion of multiple teachers to facilitate a more diverse and accurate acquisition of knowledge. However, the current multi-teacher knowledge distillation methods only consider the integrity of integrated knowledge from the teachers’ level in teacher weight assignments, which largely ignores the student’s preference for knowledge. This will result in inefficient and redundant knowledge transfer, thereby limiting the learning effect of the student network. To more efficiently integrate teacher knowledge suitable for student learning, we propose Correlation Guided Multi-Teacher Knowledge Distillation (CG-MTKD), which utilizes the feedback of the student's learning effects to achieve the purpose of integrating the student's preferred knowledge. Through extensive experiments on two public datasets, CIFAR-10 and CIFAR-100, we demonstrate that our method, CG-MTKD, can effectively integrate the knowledge of student preferences during teacher weight assignments.