ISAIR2022: THE 7TH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND ROBOTICS 2022
PROGRAM FOR SUNDAY, OCTOBER 23RD
Days:
previous day
all days

View: session overviewtalk overview

09:00-18:00 Session 3

This program is scheduled in China Standard Time Zone. 

ISAIR2022 Day3

Zoom Link:

Meeting ID: 875 1772 6133

Passcode: sent to your email.

Location: Main Room
09:00
Noise-robust Grasp Position Estimation Using PointNet
PRESENTER: Fujikame Ryoto

ABSTRACT. Robotic arms are currently used in many factories, most of them are taught to perform the required actions. However, when the environment where the robotic arms are located, the object grasped changes, or other actions are required, the robot arms need to be taught again. In general, we assign appropriate three-dimensional coordinates to the surrounding workspace and move the arm to the coordinates of the target object. LiDAR can obtain the three-dimensional coordinates of the object more accurately because it can directly determine the distance from the camera by emitting a laser beam to the object and receiving the reflected light. However, LiDAR has a drawback that it is easily affected by external noise such as ambient light and noise inside the sensor. To solve the aforementioned problem, this paper applies a deep learning model of Point-Net to the 3D point cloud to eliminate the influence of noise and estimate the grasping position. Here, the Point-Net network is used as the object classification, and the output is changed from (x,y,z) to the Six degrees of freedom coordinates information (x,y,z,roll,pitch,yaw) in the basic setting to estimate the object's posture. However, in the real world, it is impossible to determine the error of the pose estimation on a real machine because the true values do not exist. For this reason, in this experiment, I evaluated the accuracy of the object posture estimation on the simulation, where the true value of the object can be obtained, and also evaluated its performance by performing pick-and-place with the actual robot using the estimated posture as the grasping position. This paper demonstrates the effectiveness of grasping position estimation using a deep learning model of Point-Net on three-dimensional point cloud data acquired by a LiDAR sensor in a robotic system.

09:10
PBLF: Prompt Based Learning Framework for Cross-modal Recipe Retrieval
PRESENTER: Jialiang Sun

ABSTRACT. Nowadays, widespread attention is drawn to cross-modal recipe retrieval due to the various food-relevant applications and the increasing concern on health. This task is addressable through a combination of multi-modal data (e.g., images and texts), which have far-reaching meaning on the merging of vision and language. Early researchers focus on learning joint representation by projecting food images and recipe texts (e.g., ingredients and instructions) to the same embedding space and proposing different cross-modal fusion structures. Recently, most methods adopt a pre-train model and fine-tune strategy to help capture the alignment between modalities. While offering appreciable retrieval performance, two limitations still exist in these methods: 1) with the increasing complexity of the pre-trained model, the data requirements and the cost of calculating in fine-tune stage are also rising and 2) the downstream fine-tune tasks they designed for cross-modal recipe retrieval have a gap with the pre-trained model. To this end, we propose a novel fusion framework named \textbf{P}rompt \textbf{B}ased \textbf{L}earning \textbf{F}ramework (PBLF) to adopt transferable visual model CLIP (Contrastive Language-Image Pre-training) into the recipe retrieval task for the first time, and design an appropriate prompt to train the model efficiently, which bridge the gap between the pre-trained model and the downstream task and transfer the knowledge of the CLIP model to specific recipe retrieval task. The extensive experiments on the large-scale cross-modal recipe dataset Recipe1M demonstrate the superiority of our proposed PBLF model compared to the state-of-the-art approaches.

09:20
A Malleable Boundary Network for Temporal Action Detection
PRESENTER: Tian Wang

ABSTRACT. Temporal action detection in untrimmed videos is a challenging task aiming to predict the boundary and category of action instances. It can be useful in transportation. In this paper, we propose a two-stage framework Malleable Boundary Network (MB-Net) to adaptively regress proposals based on finer scores. In particular, MB-Net consists of a Potential Boundary Generator in the first stage and an Adaptive Proposal Detector in the second stage. First, the Potential Boundary Generator fuses multiple sets of flexible score sequences to obtain tentative proposals through a frame-level feature in an anchor-free way. Then, the Adaptive Proposal Detector employs parallel modules to filter, classify and regress proposals adaptively. Besides, we propose an easy-to-realize feature augmented method Structured Temporal Segment Pooling, which makes full use of the information throughout the whole proposal. Experiments show that MB-Net achieves state-of-the-art performance on popular benchmarks THUMOS-14 and Activity-1.3 with an improvement of 1.9% and 1.2%.

09:30
Instance segmentation using an improved Mask R-CNN
PRESENTER: Kouta Nagasawa

ABSTRACT. In 2012, AlexNet, a deep learning-based method, won the ILSVRC image recognition competition by a large margin over other methods. Since then, deep learning-based methods have taken center stage in image recognition.  Image recognition using deep learning has been widely used in agriculture, factory automation, automated driving, and medical fields. In the fields of automated driving and medical care, the accuracy of the image recognition directly affects human lives. For these reasons, the importance of improving the accuracy of image recognition is clear.  In this research, I have studied segmentation, especially instance segmentation, in image recognition. The method used is Mask R-CNN, which is the basis of current state-of-the-art methods. The network structure includes ResNet, and I tried to improve the accuracy by adding Squeeze-and-Excitation Block (SE Block) to it.

09:40
Impedance synovial control for lower limb rehabilitation exoskeleton system
PRESENTER: Xinyu Zhu

ABSTRACT. It is very important to improve the working stability and wearing comfort for the lower limb exoskeleton system, so as to increase the rehabilitation effect of patients with lower limb injuries. However, due to the complex structure of the lower limb exoskeleton, there are uncertain disturbances in the system. Traditional control methods cannot meet the requirements of dynamic response and robustness, since there are still many shortcomings in the safety and compliance of the lower limb exoskeleton for rehabilitation wearing. In this paper, an impedance synovial control strategy for the exoskeleton of the lower limbs is proposed. The safety of contact force is considered, and the impedance controller is combined with the improved integral terminal synovial controller (ITSMC) to reduce system’s errors, and to ensure system’s rapidity. The stability is analyzed by a Lyapunov function. The simulation results show that the combination of the synovial controller and impedance controller has superior performances, and the entire system has a good trajectory tracking effect.

09:50
Attention-based Dynamic Graph CNN for Point Cloud Classification
PRESENTER: Junfei Wang

ABSTRACT. In this paper, we propose an attention-based dynamic graph CNN method for point cloud classification. We introduce an efficient channel attention module into each edge convolution block of dynamic graph CNN (DGCNN) to obtain more discriminative and stable features. Our experimental results show that, compared with DGCNN, our method improves not only the classification accuracy but also the stability.

10:00
Cs-YOWO: Human behavior detection model based on attention mechanism
PRESENTER: Zhihong Li

ABSTRACT. Human behavior detection plays an important role in traffic supervision, medical supervision and social security jurisdiction application requirements. The Cs-YOWO model is the new model put forward in this report, which is improved base on the original video behavior detection model YOWO. By adding Selective Kernel Network and Convolutional block attention module, this model solves the problem of insufficient image feature extraction by convolutional neural network in the original model, and imperfect fusion of spatial information and temporal information in the feature fusion part. On the UCF101-24 and JHMDB datasets test, the average accuracy of the frame level and video level was improved by 1.1% compared with the original model. Meanwhile, on the UCF101-24 dataset, the classification accuracy index was improved 1.9%. The experimental results show that the improved method can fully extract video information and enhance the model detection performance.

10:10
Research on server activity of power grid system based on deep learning
PRESENTER: Longfei Zhou

ABSTRACT. There is a big difference in the change law of the activity between the network services in the power grid server. The traditional single network traffic forecasting method cannot accurately predict the traffic of some network services. In order to predict the grid server activity better, this paper proposes a deep learning-based grid system server activity prediction algorithm which has higher accuracy and adaptability than traditional methods. The algorithm firstly analyzes the grid server traffic data, and proposes a three-dimensional network node activity calculation method based on the total number of bytes, the total data packets and the number of visits. Then based on different deep learning models, the future service activity of the network service is predicted, and an optimal model is obtained through model evaluation and comparison, which is saved in the server for real-time calling. The algorithm proposed in this paper can accurately predict the future activity of the grid system server and has a stronger adaptability to the grid service prediction business with different flow characteristics.

10:20
Network traffic classification method of power system based on DNN and K-means
PRESENTER: Yiming Sun

ABSTRACT. The classification of power system network traffic is the key point of power system network security protection. Therefore, we propose a power system network traffic classification method based on deep learning algorithm and K-means clustering algorithm. Our method uses a deep neural network (DNN) to classify the pre-processed network traffic data to determine whether it is a server, and then use the K-means algorithm to perform secondary classification on the data that has been identified as a server to help users identifie the specific server type. Experiments with real power system network data prove that the classification of this method has a high accuracy rate and can meet the needs of power system network traffic classification in real environments.

10:30
Pelvic Segmentation Based on MultiR2UNet
PRESENTER: Zhaokai Kong

ABSTRACT. For MRI images of pelvis, it is helpful for doctors to extract the structure of pelvis quickly and accurately. Then the disease in the pelvic area can be diagnosed and analyzed in time. Extracting skeletal contour from MRI images of pelvis is not only time consuming but also low precice. Therefore, this paper proposes an improved image segmentation algorithm based on MultiR2UNet. We adopted R2UNet, which is more accurate in the segmentation field, as the backbone network. The residual connection is used in the network hopping layer, and the MultiRes Block is used in the up-sampling, which is beneficial to increase the depth of the network and to extract more detailed features. Due to the small number of pelvis training samples and the imbalance of samples, we performed data enhancement in the data preprocessing stage. The data samples were effectively amplified. In the training phase, we propose to use the mixed loss function. After several times of training and detection, the gap between the pelvis section segmentation by the algorithm in this paper and the real label is fairly small, and their coincidence degree can reach about 91%. The average segmentation time for each image was about 0.012s. The experimental results show that the proposed algorithm can guarantee the segmentation accuracy. MultiR2UNet is an effective real-time pelvis segmentation algorithm.

10:40
MakeupGAN: Controlled Makeup Editing with Deep Generative Adversarial Network
PRESENTER: Qiurun Cai

ABSTRACT. Although GAN (Generative Adversarial Networks) have performed well in the field of image generation, previous studies on GAN usually explored how to interpolate in the latent space to make the image generation more smoothly, without considering whether the semantic attributes are controllable or not. It fails to completely control the semantic meaning of the generated images as required by many tasks. In this work, we propose a new framework for makeup face editing, called MakeupGAN, which decouples semantic entanglement in the latent space of GAN, learns semantic features of makeup, and achieve a more accurate control of image semantic attributes. Experimental results show that our method can generate accurate and controllable face state. Each state represents a different level of makeup, enabling smooth editing/switching between different levels of makeup images.

10:50
Featured Deep Object Pose Estimation for Intelligent Transportation Environmental Information Recognition
PRESENTER: Xiu Chen

ABSTRACT. Autonomous driving technology needs to understand the pose information of the surrounding objects in the environment of the vehicle, but the complex scene and occlusion between objects in the scene bring challenges to the automatic driving technology. In this paper, we proposed a lightweight convolutional neural network for pose estimation, our method greatly reduces the amount of network parameters while ensuring the network detection performance. Specifically, we improved the feature extraction part of the original network, we used ResNet (Residual Network) pre-trained by ImageNet to extract image features, then passed the feature vector to the branch network for further processing, in the branch network we improved the original convolutional layer with 7×7 convolution kernel, the convolution layer with three convolution kernels of 3×3 is used instead. The quantity of parameters of our improved network is 38.7% of the original network, we trained on a synthetic data, experiment results showed that the improved network can ensure good performance on test data and accurately predict the pose information of objects in the scene.

11:00
Analysis And Control For Bilateral Teleoperation Systems With Part Active Power
PRESENTER: Jing Bai

ABSTRACT. Bilateral operation systems have been widely used in daily life and many filed. Generally, both master manipulators and slave manipulators are assumed to be passive devices. However, master manipulators may always bring part active power in the integral network due to the active haptic device. In this paper, we study the bilateral system with part active power, analyze the stability condition is analyzed and a PD bilateral controller is proposed on the basis of four channel two-wave network. Numerical simulations are performed and results demonstrate the efficiency of proposed method.

11:10
Invariant EKF Based Mobile robot Simultaneous Localization and Mapping Navigation
PRESENTER: Chaoyue Gu

ABSTRACT. As robots suffered from magnetic field disturbances, they cannot use satellite navigation such as GPS to explore the wide unknown area. In this situation, the Simultaneous Localization and Mapping is an efficient choice to obtain the global map by sensors or visions. However, it is inevitable including noises and other disturbances. In this paper, we proposed an Invariant Extended Kalman Filter(IEKF) based navigation method for single mobile robot. The proposed method is simulated on MATLAB platform. Results show good performances of proposed method.

11:20
GGM-Net: Gradient Constraint on Multi-Category Brain MRI Segmentation
PRESENTER: Yuanyuan Wang

ABSTRACT. In the diagnosis and treatment of brain tumor, the position, shape, and size of tumor are the key factors to be taken into account. Existing deep learning methods have strong advantages in medical image analysis. However, for the multi-category brain tumor segmentation task, the glioma characteristics of diffuse infiltration and growth, wide range, unclear boundary, and swelling of brain tissue in the affected area lead to poor segmentation performance near intersection area. Most medical image segmentation methods extract the region of interest based on the gray information of the image, rather than introducing gradient information. In addition, the complexity of multi-modality medical images and the huge differences between brain tumor areas make it difficult to segment brain tumors efficiently and accurately. To solve the above problems, we propose an gradient-guided multi-category brain tumor segmentation method(GGM-Net). Proposed algorithm includes three branches: (1) the Dual-ConvD encoding branch which could capture rich features from multi-modality MRI, (2) the gradient detecting branch which could generate gradient features to assist area segmentation, and multi-category segmentation decoding branch which could effectively fusion contour information and encoding features. We evaluated the effectiveness of the proposed algorithm on the Brain Tumor Segmentation Challenge (Brats) 2020 dataset, of which 295 cases are used as training sets and 74 cases as test sets. Finally, the Dice Similarity Coefficients (DSC) of the proposed algorithm in whole tumor (WT), tumor core (TC) and enhanced tumor (ET) are 0.9010, 0.8289 and 0.7543 respectively, and the average dice is 0.8278. Experimental results show that the algorithm can be successfully applied to the segmentation of brain tumors, and is helpful for doctors in diagnosis and treatment.

11:30
Linear split attention for Pavement Crack Detection.
PRESENTER: Guoliang Yan

ABSTRACT. The number of vehicles on the road is increasing rapidly, the road is aging faster than ever, cracks are an early manifestation of road aging. Timely and accurate crack detection can avoid high maintenance costs.It is difficult to improve the accuracy of crack detection based on deep learning because of the noise and complex crack edges. In order to solve these problems, this paper takes vgg16 as the backbone network and eliminates the last pool operation and full connection operation, retaining the crack features in the deep information. In order to extract more location information and linear features from the deep feature diagram, we designed Linear Split Attention Module(LSAM). Multi-scale Feature Fusion module(MFFM) was designed for the last layer to capture deeper information features and to strengthen the connection between high level features and low level features. Then the upsample module is optimized, and different upsample methods are used in the high-level and low-level convolutional layer to improve the detection accuracy and operation efficiency of the network. Finally, compared to the other 5 methods in two dataset, the prediction accuracy of this method on DeepCrack dataset improved across the board and was superior to that of other studies. This method is better at Recall on CrackForest dataset.

11:40
Information Acquisition and Feature Extraction of Motor Imagery EEG
PRESENTER: Chen Ma

ABSTRACT. Brain-computer interface (BCI) is a new interaction model that directly connects the human brain or animal brain with external devices, which has a wide range of application scenarios. Through the BCI technology based on electroencephalography (EEG) signal, the communication and control of external devices can be realized independently of the peripheral nervous system and muscle tissue. Motor imagery (MI) is a process in which people imagine their limbs or muscles moving, to control some external auxiliary devices (wheelchairs, robotic arms, robots etc.) so that people without motor ability can restore their communication and motor ability to a certain extent. In this paper, the basic situation of EEG and EEG signal acquisition is introduced first. Then, the analysis methods and research contents of EEG signal preprocessing, feature extraction, and feature classification based on motor imagery are introduced in detail. Finally, the brain-computer interface technology based on motor imagery is summarized and prospected.

11:50
Research on Mixed-Criticality Task Scheduling Method Based on Comprehensive Impact Factor
PRESENTER: Shujuan Huang

ABSTRACT. This paper proposed a mixed-criticality task scheduling based on the comprehensive impact factor, to address the problem that does not meet the needs of users by ignoring or completely abandoning the services of low criticality tasks when the system criticality is improved. Considering several factors such as criticality, task utilization, and idle window, the system can provide certain service guarantees for low criticality tasks after the system criticality is raised, that is, the low criticality tasks can be coordinated scheduling while ensuring the completion of high criticality tasks so that increase the proportion of fulfilling user needs. The experimental results show that compared with the CAPA algorithm and the FBTWP-CFP algorithm, this method improves the schedulability of low criticality tasks by about 15%, and reduces the preemption and migration ratio of tasks by about 20%.

12:00
MLRA-Net: A Multi-scale Transformer and CNNs Network for Stroke Lesion Segmentation
PRESENTER: Zelin Wu

ABSTRACT. Medical image segmentation is widely used in diagnostic medical imaging and computer-assisted interventions, such as the segmentation of ischemic stroke lesions. Accurate segmentation of lesions can help doctors to complete clinical diagnosis in time, so as to master the critical treatment period and improve the diagnosis efficiency of the stroke. Methods based on deep learning, especially convolutional neural networks (CNNs), are commonly used for ischemic stroke lesion segmentation. However, due to the limitations on the receptive field of convolution operation, it is difficult for CNNs to establish long-range dependence. In this paper, we aim to capture the different scales local-global semantic features for lesion segmentation. For this target, we propose the Mutil-scale Long-range and Regional Attention Network (MLRA-Net), which not only uses convolution layers to build local features by patch partition block but also adopts transformers to encode tokenized image patches to extract global information at multiple scales. The MLRA-Net uses transformers in multiple hierarchies to establish spatial features of different scales and achieves the re-direction of shallow features for upsampling recovery through skip connections.Our experimental results on the ATLAS dataset demonstrate that, compared with existing benchmark methods: U-Net, CLCI-Net, TransUNet, and TransFuse, our MLRA-Net can give powerful segmentation performance with 60.48% DSC and 14.51mm HD, which achieves an improvement of 7.56% DSC and 25.36% HD than that of TransUNet.

12:10
Abnormal recognition of wind turbine generator based on SCADA data analysis using CNN and LSTM with nuclear principal component analysis
PRESENTER: Anfeng Zhu

ABSTRACT. The complex and changeable working environment of wind turbine often challenges the condition monitoring and abnormal recognition. In this paper, a new method is proposed for abnormal recognition of wind turbine generator, in which the convolutional neural network (CNN) cascades to the long and short term memory network (LSTM) based on nuclear principal component analysis (KPCA). Firstly, the quartile method is used to preprocess SCADA data to delete abnormal data and improve data effectiveness. Then, by selecting the input variables through Pearson correlation coefficient, KPCA can eliminate the nonlinearity of process variables and improve the generalization ability of the algorithm. In this study, CNN and LSTM based on KPCA state recognition model is established by extracting principal components from KPCA. The model can warn the abnormal state of the generator through the prediction residual. The prediction residual exceeds the threshold for many times, indicating that the operation state is abnormal. Finally, the state of wind turbine generator is predicted through an example to verify the effectiveness of this method.

12:20
Research on Wireless Spectrum Allocation and Utility Evaluation of Cross-domain Multi-slot Radio
PRESENTER: Xueqing Li

ABSTRACT. The existing spectrum use authorization adopts the large period and large-area allocation mode, and users exclusively use the right to use the spectrum, resulting in the idle spectrum of multiple time slots and regions and a low utilization rate. This paper proposes a cross-domain, multi-slot Wireless Spectrum Allocation Mechanism (CMWSA) from a fine-grained spatiotemporal perspective. The bilateral spectrum auction model includes the buyer’s location, the start and end time, and the interference radius in the auction information, and the multi-spectrum buyer virtual grouping algorithm is designed. By classifying the spectrum and constructing the interference matrix, across-domain, multi-slot, and spatiotemporal reuse Wireless Spectrum auction mechanism is proposed to improve the utilization of the idle spectrum and increase users’ satisfaction. Simulation results show that CMWSA can improve the spectrum utilization rate of secondary users, meet the diverse needs of secondary users, and improve the spectrum reuse rate.

12:30
LayoutLM-Critic: Multimodal Language Model for Text Error Correction of Optical Character Recognition
PRESENTER: Qinkun Xu

ABSTRACT. Recently, many approaches have been proposed to correct grammatical errors. Above them, LM-Critic. (Language model critic) achieved great success. It uses a language model to judge whether a sentence is grammatical and then uses Break-It-Fix-It (BIFI) framework to fix the broken sentence. However, it does not work in the scenario of multimodal, since the text is usually not a complete sentence and the errors are often not grammatical errors. Besides, because of the noise of scanned images, there always be inevitable recognition mistakes even though using the best Optical Character Recognition (OCR) engine. And some of those are intolerable, such as errors in receipt date, total amount, etc. Therefore, it's essential to introduce an error correction system to fix OCR results. Inspired by Lay-outLMv2 (Layout Language Model version 2), which introduces a pretraining task to align the text and image, we present LayoutLM-Critic, a critic used to assess how much a sen-tence matches the bounding box and image for the visually-rich document.

12:40
Blind image deblurring via fast local extreme intensity prior

ABSTRACT. Blind image deblurring is a challenging problem in low-level computer vision, which aims to recover blur kernel and latent sharp image from a single blurry input. In recent years, channel priors such as dark channel prior and extreme channel prior have shown excellent results. However, the high computational cost and approximate solution of sub-problem limited the performance of these models. In this paper, a novel fast local extreme intensity prior (LEP) based on maximum a posterior (MAP) framework is presented for kernel estimation. The LEP is inspired by the observation that the blur will damage the local extreme intensity of an image patch. Moreover, we show the LEP is sparser in clear images than blurred ones, thus the change in sparsity of LEP motivated us to explore the kernel estimation model based on LEP. Then, unlike traditional half-quadratic splitting based optimization strategy, an effective and fast optimization algorithm is developed for this non-convex nonlinear problem. Experimental results on image sets show that the proposed algorithm is superior to state-of-the-art methods.

12:50
STRDD: Scene Text Removal with Diffusion Probabilistic Models
PRESENTER: Wentao Yang

ABSTRACT. Scene text removal (STR), which aims to erase text in the wild and fill with visually plausible content. Because text in the wild is often in a complex background, existing methods fail to replace the text regions with a visually plausible background. To tackle this challenge, it requires the model to learn the distribution of a large amount of data. Inspired by the successful adoptions of score-based diffusion models in image generation task, we proposed a new two-stage text erasing approach termed as STRDD. STRDD contains two modules: an autoencoder and an SDE. The autoencoder encodes the image into the features and reconstruct the image from the features. SDE learns the distribution of features encoded by the encoder and uses non-text regions in the image as conditions to turn text regions into background. The results of experiment on real-world dataset demonstrate that STRDD can remove the text in the wild well and achieve improvements on STR as compared to all baselines.

13:00
Geometry-Aware Network for Table Structure Recognition in Wild
PRESENTER: Baoyu Xu

ABSTRACT. Table is a common representation format used to record and summarize important data in our daily life. Table detection, cell detection and table structure recognition have been widely discussed in recent years. The recognition of table structure in clean and noiseless images or documents has achieved good results, but in the real world with distorted images containing noise disturbance, the existing methods can not get good results. The reason for this problem is that the image in the real world contains various distortions, such as bending, which causes the model to fail to parse the table correctly. To solve this problem, we propose a network with geometry awareness, which can enhance the ability of the network when facing the real world images containing distortion.

13:10
A differential evolution algorithm with adaptive population size reduction strategy

ABSTRACT. Aiming at the problem that the differential evolution algorithm easily falls into a local optimum and results in premature convergence, a new differential evolution algorithm with an adaptive population size reduction strategy (APRDE) is proposed. Firstly, in the mutation and crossover operation, to balance the local exploitation and global exploration capabilities of the algorithm, a parameter adaptive tunning scheme based on the hyperbolic tangent function and Cauchy distribution is proposed to adaptively adjust the parameter factors. Secondly, an ordered mutation strategy is adopted to guide the direction of mutating and enrich the diversity of the population. Lastly, after each evolution iteration, adaptively reducing the population size according to the error between the fitness values of individuals and the current optimal. The proposed algorithm is compared with 5 other optimization algorithms on 8 typical benchmark functions. The results show that the algorithm has a great improvement in solution accuracy, stability and convergence speed.

13:20
Tracking and matching of track closed loop sequence wide baseline image feature
PRESENTER: Shelei Li

ABSTRACT. In the three-dimensional reconstruction of unmanned aerial vehicle (UAV) oblique photography, the variation of illumination and viewing angle will lead to the instability of interest points extraction. The wide baseline images are invalidated by neighborhood cross-correlation methods. Based on the analysis of continuous closed-loop image data, a feature tracking and matching algorithm of track closed loop sequence wide baseline image is introduced in this paper. Firstly, Interest points of each image are extracted by SuperPoint algorithm, and the continuous pairwise matching is carried out by SuperGlue algorithm; then, the matching results are used for feature tracking in both positive and negative directions, and the feature tracking results of the two directions are fused; finally, DEGENSAC is used to filter outliers, so as to obtain the optimal matching result. The experimental results show that, t for wide baseline image data, the matching points obtained by this algorithm are more uniform than those obtained by ASIFT +FLANN algorithm, and more feature points can be matched than those obtained by SuperPoint+SuperGlue algorithm based on machine learning, and this algorithm is more robust in wide baseline feature matching.

13:30
Interact-Pose Datasets for Multi-Person Interaction Human Pose Estimation
PRESENTER: Yifei Jaing

ABSTRACT. The task of 2D human pose estimation in complex multi-person scenes has become a popular research topic in the field of computer vision. In recent years, several excellent works dealing with human pose estimation have focus on the problems of complex backgrounds and the multiplayer scene. However, when facing the scene of multi-person interaction, which usually appears in confrontational sports competitions or dancing, the results of current mainstream algorithms are still unsatisfactory. In addition, on common datasets, MPII, MSCOCO and CrowdPose, which are used to solve the 2D human pose estimation problem, the amount of images that meets the multi-person interaction standard is so small that we could not use these datasets to cope with interaction problems. For the reason of that, we propose a new dataset named Interact-Pose for solving two-person and even multi-person interactions. Firstly, we use the MSCOCO format to annotate Interact-Pose. Except that, we adopt the corresponding data augmentation scheme to exchange the background of the Interact-Pose dataset to make it more complex and have better generalization performance. Then it is trained after being fused with the MSCOCO dataset. Finally, adding Interact-Pose for fusion training HigherHRNet, on the Validation set of COCO2017, the average AP value of our test results is 67.3%, which is 0.2% higher than that of the test only using MSCOCO as the training set. Meanwhile, on our proposed Interact-Pose, our scheme achieves more than 20% higher average AP value than the test results trained with COCO dataset merely.

13:40
Multimodal breast cancer diagnosis based on Multi-level fusion network
PRESENTER: Mingyu Song

ABSTRACT. With the widespread application of artificial intelligence technology, deep learning algorithms have been extensively applied to the diagnosis and screening of breast cancer. However, the classification of breast cancer with the data from a single modality is still not accurate enough to meet clinical needs. This paper proposes a Multimodal breast cancer diagnosis based on Multi-level fusion network which integrates pathological images, structured data and medical description text. Specifically, we first construct a fully connected graph to extract the node and graph level feature representation of pathological images with graph attention layers. Second, we use the BERT model to extract the text features from the medical records. At last, the features of the above three modal data are fused using a multimodal adaption gate(MAG) for diagnosis. Experimental results indicate that the proposed method obtains superior performance(accuracy 93.62%) to most baseline methods on PathoEMR dataset.

13:50
Backdoor Attack Against Deep Learning-based Autonomous Driving with Fogging
PRESENTER: Xueyan Wang

ABSTRACT. In recent years, autonomous driving is the main research direction of the automotive industry. With the more and more in-depth research on autonomous driving, safety is the top priority in the research process of autonomous driving. A large number of studies show that the deep learning model widely used in autonomous driving is very fragile, and it may be affected by the carefully planned backdoor attack during training. Backdoor attack is to inject a small part of training data into the backdoor mode and then train the target model (victim model). In the test stage or application stage, the victim model performs normally on the clean data set, but once it encounters the data with the backdoor mode trigger designed by us, it will predict other incorrect results, which may be spe-cially set by the attacker. Based on this, this paper proposes a new method for the vehicle autonomous driving fogging attack, by fogging a small part of training data and implanting the back door into the victim model, we demonstrate on three deep neural network models and two data sets, which shows that we have a high attack success rate of 99%When the input image is fogged heavily enough.

14:00
Discriminative Feature Fusion for RGB-D Salient Object Detection
PRESENTER: Zeyu Chen

ABSTRACT. RGB-D salient object detection is the segmentation of eye-captureable objects from an image with the aid of depth. Although many excellent methods have been proposed, difficulties such as salient object localization still exist, which is due to two challenges: (1) It is difficult to fully and effectively fuse RGB and depth features; (2) How to enhance the semantic information of low-level feature and enrich the spatial information of high-level feature. To solve the above two problems, we propose a discriminative cross-modal fusion module to effectively fuse RGB and depth features, design a discriminative multi-scale fusion module to enhance the semantic and spatial information of the features, and embed a multi-scale contextual perception module in the network to accurately localize the objects with different scales. We conducted comparison tests with 14 state-of-the-art methods on 8 datasets, and the experimental results demonstrate the superiority of our method.

14:10
Part based Face Stylization via Multiple Generative Adversarial Networks
PRESENTER: Wu Zhou

ABSTRACT. In recent years, due to the improvement of scientific research methods and the wide-open source and acquisition of related data sets, face stylization has become a hot research field and application direction. There is a need to stylize face images in many applications, such as camera beauty, artistic photo processing, etc. However, most of the current schemes are not satisfactory, and the resultant image synthesis traces are obvious, and the effect is relatively monotonous. Based on the study of image features and style representation, this paper proposes a general-purpose face image style transfer whole process scheme. It can fill the gap in local style transfer of face images. Among the existing face stylization methods, the face stylization method is more complex, and the resulting obvious image synthesis trace along with the single effect. The project innovates the existing technology that can split the whole picture and implements the following six functions. Including the segmentation of specific portrait parts (hair), the skin buffing and whitening of the face, the defuzzification of the photos, the style transfer of the hair, the messy hair removal, and the implementation of the big eye effect. This study can realize the automatic style conversion of specific face images quickly and with high quality.

14:20
FedMD - Federated Learning for Maximum Differential Choice Based on Global Perspective

ABSTRACT. Federated learning has received extensive attention as a new distributed learning framework, which enables joint modeling without data sharing. However, it is still affected by the communication bottleneck, and most clients may not be able to participate in the joint learning modeling at the same time, resulting in slow convergence. To solve the above problems, we propose a federated learning aggregation algorithm based on a global perspective, which considers the data distribution of participating clients. The server builds a feature distribution table according to the data distribution, and each time the server selects a set of clients for training, it will cover more features to a greater extent to learn the global data more fully. Specifically, the selection of these clients is not random. When the server selects, it will construct a set of clients with the largest mutual distribution difference within the range of visible clients, and place it at the end of the selection chain after each training until all clients’ end is selected. We demonstrate the effectiveness of our work through comprehensive experiments and comparisons between the two most popular algorithms.

14:30
GPointNet: Point cloud deep learning network based on PointNet

ABSTRACT. Due to the naturally unordered point cloud data structure, it is still challenging to classify and segment 3D point clouds using deep networks. In previous work, PointNet was able to learn directly from point cloud data with good results. However, the method of mapping the coordinate information of the point cloud to the high-dimensional redundant space and then using maximum pooling to extract global features still faces the situation of insufficient information and loss. To this end, we propose two ways to improve PointNet more efficiently. On the one hand, we enrich the geometric information of points in the underlying 3D space. On the other hand, we extract global features across layers in a larger network to compensate for the loss caused by pooling. The new feature extraction method will be applied to the main network and spatial transformation network. We added label smoothing to the loss function and used the Ranger optimizer in training to implement various tasks of point clouds. Experiments show that we perform better than PointNet in parts segmentation and object classification.

14:40
Network Intrusion Detection Method Based on Optimized Multiclass Support Vector Machine

ABSTRACT. With the popularization of network applications and the great changes in the international political, economic and military situations, network security is becoming more and more important. As an important part of network security, network intrusion detection (NID) is still facing the problem of low detection rate and difficulty to meet the real-time demand with the rapid increase of network traffic. Therefore, for the requirement of fast and accurate detection in real-time applications, this paper proposes a NID method based on optimized multiclass support vector machine(SVM). Firstly, the ReliefF feature selection algorithm is introduced to extract features with heuristic search rules based on variable similarity, which reduces the complexity of features and the amount of calculation; Secondly, a SVM training method based on data block method is proposed to improve the training speed; Finally, a multiclass SVM classifier is designed for typical attack types. Experimental results show that the proposed optimization method can achieve a detection rate of 96.9% and shorten the training time by 13.2% on average.

14:50
Substation meter detection and recognition method based on lightweight deep learning model

ABSTRACT. With the advancement of robotics, intelligent robots are widely used in substation inspections. In view of the problem that the parameters of the deep learning model are too large, and the performance of embedded devices is limited, this paper proposes a meter detection and recognition method based on a lightweight deep learning model, which provides support for deploying the model to the substation intelligent inspection robot. First perform target detection on the input image to detect the position frame of the dashboard; then extract the target area, perform semantic segmentation in the target area, segment the mask of the pointer and scale, and convert the mask into two-dimensional by scanning the image is converted into a one-dimensional array, and the position of the pointer and scale is predicted through peak detection, and finally the scale is calculated according to the scale and range. The invention applies the lightweight method of pruning and knowledge distillation based on the YOLOv7-tiny model in the target detection stage, so that the model is greatly compressed while maintaining the prediction accuracy; in the semantic segmentation stage, a lightweight method based on depth-wise separable convolution is used. The lightweight U2NetP model replaces the U2Net model, which greatly reduces the amount of model parameters. The experimental results show that the lightweight method used in this paper can compress the original YOLOv7-tiny model by 95.7%, the average accuracy rate can reach 90.5%, the original U2NetP model can be compressed by 76.8%, the average IOU can reach 88.7%, and the average pixel accuracy rate can reach 99.4%.

15:00
Detection of insulator defects based on improved YOLOv7 model

ABSTRACT. Inspection for the electric transmission system has great significance for powerline maintenance, in which defects of insulators are needed to be found in time to preserve the safety of the whole system. To improve the accuracy and efficiency of insulator defect detection, computer vision techniques are employed. However, since insulator defects on the insulator strings are small objects and usually works in complex environment, it is challenging to get satisfactory detection results. In order to solve this issue, we proposed an insulator defect detection method based on YOLOv7 which is one of the state-of-the-art object detection methods. By introducing coordinate attention mechanism into the backbone network and redesigning the feature pyramid network (FPN) to have bi-directional FPN like structure, we successfully adapt the original model to the insulator defect detection task. We used an open-source dataset called CPLID to train our model. Experiments demonstrate that our method achieve good performance for insulator defect detection and have better average precision when comparing with other methods. Ablation study were also designed to verify the effectiveness of the improved component.

15:10
Thermal Defect Segmentation of Electrical Equipment based on Saliency Constraint

ABSTRACT. Thermal defect detection aims to identify overheated areas of electric accessory with the help of infrared imaging technology. In this paper, we propose a thermal defect segmentation method based on saliency constraint. Specifically, we first design a convolutional neural network for infrared image classification, the thermal ones of which are then denoised and enhanced by image preprocessing; Next, the modified K-means clustering algorithm is utilized for region segmentation, which splinters infrared images as environment area, normal area and thermal area; Finally, we perform saliency detection on infrared images to obtain approximate region of temperature anomaly, and the overheated area is likewise segmented based on the modified K-means clustering algorithm, which is subsequently used to revise the thermal area segmented based on enhanced images to satisfy saliency constraint. Experimental results suggest that our method can improve the diagnostic efficiency of infrared images and realize the precise positioning of thermal defects, which outperforms the state-of-the-arts.

15:20
Copy and Restricted Paste: Data Augmentation for Small Object Detection in Specific Scenes

ABSTRACT. Small object detection has been a difficult task because of small area, low resolution, few available features and many other problems. In order to improve the performance of small object detection, a classical augmentation method, which copies and pastes small objects to the image, is usually adopted. However, in some specific scenes, small objects cannot be pasted completely randomly on the picture without any area restriction. In this paper, to solve the task of small object detection in specific scenes, on the basis of the copy-paste augmentation method, we further design three strategies to restrict the paste position of the copied object to the target area of the image. In this way, the augmented image is more realistic to the scenario, which can improve the performance of small object detection. We conduct experiments on different object detection methods, and validate that in contrast to two-stage object detection methods, our copy-and-restricted-paste augmentation strategy is more suitable for one-stage object detection methods.

15:30
Gait Recognition for Laboratory Safety Management Based on Human Body Pose Model

ABSTRACT. Most of the important data and equipment storage places, such as laboratories, mainly rely on manual management and various biometric systems to ensure their security. Currently, the commonly used biometric systems include facial recognition, fingerprint recognition, voice recognition and gait recognition. Gait is a unique way of moving for each person. Compared with other biometrics, gait is difficult to imitate or fake and can accomplish the supervision tasks more efficiently. This paper proposes a novel Human Body Pose (HBP) model for gait recognition in laboratory environments. Specifically, we first extract the image of each frame from the video and extract the 2D human body poses in the form of people’s joints and bones with OpenPose. Then we use a 3D pose library to estimate a 3D human pose by matching with the 2D pose. Finally, we employ a Convolutional Neural Network to extract the human temporal-spatial features for gait recognition. We train and validate our method to compare with the state-of-the-art methods on the CASIA gait dataset B. Experimental results show that our method outperforms the state-of-the-art methods in the case of cross-view and clothing changes.

15:40
Use Active Learning to Construct Japanese Emoji Emotion Database

ABSTRACT. Emojis are now frequently used in online communication, which express rich meaningful information and emotional messages. However, communi-cation will fail if the meaning of different Emojis is not well understood, es-pecially for the speakers of different languages and those from different countries/regions. There are very few researches about the Emoji dataset cur-rently, since the process of building an Emoji database is labor-intensive and time-consuming. To solve this problem, we propose an active learning-based framework for building Japanese text datasets containing Emoji. This ap-proach aims to achieve fast and balanced labeling of data given a small and unevenly distributed source of Emoji data. The active learning algorithm se-lects unlabeled data with high information content for manual labeling and updates the model parameters with the manually labeled data, in which way a large Emoji database is iteratively constructed. The constructed Japanese Emoji database contains hundred types of Emojis, with at least hundred pieces of Our experiment suggests that the Emoji dataset can be efficiently constructed with balanced data and the result dataset can provide rich in-formation for text emotion classification, by rendering an accuracy of over 82%.

15:50
Interface using Eye-gaze and Tablet Input for an Avatar Robot Control in Class Participation Support System
PRESENTER: Haruki Obayashi

ABSTRACT. In this research, an interface using eye-gaze and tablet input of an avatar robot is proposed to help students to participate in classes remotely from their homes or hospitals or medical facilities. According experimental study, the attitude control of the camera by the rotation motion using the eye-gaze input and the movement motion using the tablet input were realized. By combining these two manipulations, the user can freely acquire environmental information and communicate with friends and teachers in the classroom.

16:00
Intelligent Identification of Similar Customers for Electricity Demand Estimation based on Metadata of Household Background
PRESENTER: Jing Jiang

ABSTRACT. Electricity consumption plays an extremely crucial role in influencing the economic development of the world. To guarantee the residential electricity demand, it is necessary to analyze the electricity consumption behaviors of customers. Due to a large number of customers in real life, it is required to group similar customers for a better understanding of their behaviors. Household income is a vital and appropriate indicator to discover similar customers in a group. However, sometimes it is difficult to collect the income information of customers because of the privacy protection and deliberate hiding of information. To address this issue, this paper proposes a method to intelligently identify similar customers by exploiting their metadata of public household background information associated the household income. To evaluate the proposed method, we adopt the real datasets collected by Pullinger et al. [1] to conduct the experiments. This dataset comprises gas, electricity, and contextual data from 255 UK homes over a 23-month period ending in June 2018, in which a mean participation duration is 286 days. The results demonstrate that the proposed method is effective in grouping the customers with similar household income based on their metadata of public household background information only.

16:10
Data Security Knowledge Graph for Active Distribution Network
PRESENTER: Qianliang Li

ABSTRACT. The openness, interconnection and sharing mechanism of the active distribution network bring great security risks to business system data. The existing data security protection strategies for the active distribution network are basically based on encryption, access control, and blockchain technology, which cannot enable the active distribution network operation and maintenance and management personnel to intuitively understand the data security situation affecting the active distribution network from a global perspective. Therefore, this paper combines the concept of knowledge graph to explore the key technologies of data security knowledge graphs for the active distribution network. First, the key technologies for constructing knowledge graph are explained in detail from named entity recognition, entity relation extraction and entity alignment. And then, with the active distribution network data as the object, the process of constructing data security knowledge graph for active distribution network is explained. Finally, the challenge of constructing a data security knowledge map for active distribution network is given.

16:20
Research on Construction Technology of Graph Data Model for Intelligent Operation and Inspection of Distribution Network
PRESENTER: Junjie Wang

ABSTRACT. As the scale of the power grid continues to ex- pand, the traditional distribution network management model cannot meet the requirements of power grid development under the new situation. The current distribution network operation inspection still lacks in data collection, and it is impossible to establish an informative and intelligent operation inspection management system. The upper-level production management system is also unable to integrate due to the lack of operation inspection marketing data. Aiming at the performance problem of the visualization of topology data in the distribution network operation and inspection, this paper uses the graph data model to build knowledge, designs graphic elements for data migration, and forms a topology map for the intelligent operation and inspection of the distribution network. The research clearly and intuitively displays the specific information of the power system equipment and the physical relationship between the equipment, thus forming a data model of the grid diagram.

16:30
The Magenement for Intelligent Early Warning of Stroke Empowered by Big Data
PRESENTER: Xiaoyong Chen

ABSTRACT. Global aging population, especially with the global outbreak of coronavirus Covid-19 pandemic, has led to an imbalance in healthcare resources and endangered human health security. Digital information technology through big data empowerment and intelligent application is widely considered a key element to solve the problems. Stroke is a life-threaten disorder. Based on the theory of artificial intelligence and big data empowered Chinese medicine to treat the disease, we studied individual body health management and abnormal risk perception using human health state assessment model and make full use of big data distributed storage, rapid processing, refinement insight, visualization display and other technologies for potential risk monitoring and real-time warning of stroke. The clinical testing and evaluation provided a comprehensive prevention, control, and emergency treatment system to improve the treatment of stroke, and the deep reconstruction of health management system can generate new opportunities and ideas to solve the development dilemma of medical and health industry.

16:40
Predicting stock market index further trends based on the one-dimensional convolutional neural network
PRESENTER: Changhai Wang

ABSTRACT. In the sphere of financial investment, predicting future trends of stock market indexes using historical transaction data is a critical topic. As the complexity and extreme volatility of the stock market, precisely predicting the trajectory of the indexes is challenging. Aiming at the volatility of short-term index prediction tasks, a long-term prediction method termed One-Covn is proposed. Specifically, this method takes the mean scale of rise and fall in further few days instead of the following one day as the prediction label. First, a data normalization method is proposed, in which the historical transaction data are transformed to the scale of rise and fall. Then, a one-day step sliding window is applied to split the sequence data to prediction samples and the corresponding labels are obtained at the same time. Finally, the one-dimensional convolutional network is utilized to extract the sample deep features and also map the feature to the prediction label. To evaluate the algorithm's performance, 42 Chinese stock market indices were chosen as experimental data, the mean absolute error (MAE) and mean square error (MSE) were utilized as training loss functions. Classic approaches including ANN, LSTM, CNN-LSTM were chosen as comparison benchmarks. The results show that the method can effectively reduce the average prediction error.

16:50
3D Object Detector: A Multiscale Region Proposal Network Based on Autonomous Driving
PRESENTER: Shuo Yang

ABSTRACT. Recently, 3D object detection by point cloud data processing has been applied to robotics and autonomous driving because of the popularity of the LiDAR sensors. Point cloud data contain the depth and geometric space information of an object as compared with the 2D images, and achieve high precision for classification and location. In the traditional processing of point cloud data, the disorder and sparsity of the points are significant problems. In addition, the traditional detector can only support processing a limited number of point clouds. Thus, it is difficult to detect objects using a large number of point clouds. However, the previous methods need to sample the point cloud data into a coarser type, so they cannot avoid the loss of information and the accuracy is affected, as seen in in PV-RCNN. In this paper, we propose a multiscale feature fusion detector called multiscale region proposal networks (MS-RPNs), which can provide multiscale prediction results for difficult category objects. Meanwhile, our method can improve the detection accuracy for smaller objects with the optimal processing of the multiscale feature extraction module. The efficiency and accuracy of the multiscale region proposal network on the KITTI 3D object detection datasets was evaluated using numerous experiments.