ISAIR2022: THE 7TH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND ROBOTICS 2022
PROGRAM FOR FRIDAY, OCTOBER 21ST
Days:
next day
all days

View: session overviewtalk overview

09:00-18:00 Session 1

This program is scheduled in China Standard Time Zone. 

ISAIR2022 Day1

Zoom Link:

Meeting ID: 850 7605 8878

Passcode: sent to your email.

Location: Main Room
09:00
Cross-Modal Generation and Pair Correlation Alignment Hashing
PRESENTER: Jiaxing Deng

ABSTRACT. Cross-modal hashing is an effective way to achieve cross-modal retrieval because of its low storage requirements and high retrieval efficiency. However, most previous cross-modal hashing methods directly convert different modalities data into hash codes and use label information to construct semantic correlation between them. These methods lack of information interaction between features of different modalities features. Therefore, we propose Cross-Modal Generation and Pair Correlation Alignment Hashing (CMGCAH), which boosts crossmodal information interaction by generating cross-modal features and extra cross-modal similarity between crossmodal feature pairs. We design the conditional generative adversarial networks to generate cross-modal feature representations from one modality to another. Then, the feature similarity alignment network is used to explore the cross-modal similarity and boost information interaction. Experiments are performed on two commonly used datasets with text-image modalities, results show that CMGCAH achieved the best performance compared with many existing methods.

09:10
Breast Ultrasound Tumor Detection Based on Active Learning and Deep Learning
PRESENTER: Jiyong Tan

ABSTRACT. Early breast cancer screening and diagnosis policy plays a significance role in reducing breast cancer mortality, which is the most common malignant tumor for women. Therefore, its accuracy and efficiency are very important. To cover these challenges in mass breast screening and diagnosis, including varied ultrasound image quality from different equipments, expensive professional annotation, we propose a novel method based on active learning and convolution neural networks for selecting more informative images and tumor detection, respectively. Firstly, we verify the effectiveness of active learning in the application of our breast ultrasound data. Secondly, we select the informative images from the origin training set using the Multiple Instance Active Learning (MIAL) with One-Shot Path Aggregation Feature Pyramid Network (OPA-FPN) structure. Through this way, we effectively balance the ratio of hard samples and simple samples in the origin training set. Finally, we train the model based on EfficientDet with specific and valid parameters for our breast ultrasound data. Through the corresponding ablation experiment, it is verified that the model trained on the selected dataset by combining MIAL with OPA_FPN exceeds the origin model in the metrics about sensitive, specificity and F1-score. Meanwhile, while keeping the corresponding metrics approximately the same, the confidence of inference images from the new model is higher and stable.

09:20
Adaptive Sliding Mode Control for a Hydraulic Position Servo System
PRESENTER: Yaxing Lu

ABSTRACT. The electro-hydraulic system is a typical complex system with nonlinear and high-order characteristics, which could seriously restrict the application of many advanced control algorithms. In this study, a sliding mode controller (SMC) based on adaptive neural network is proposed for a hydraulic position servo system with nonlinearity and parameter uncertainties. Structure design of a rotary hydraulic actuator is first introduced, and mathematical model of the hydraulic position servo system is constructed based on dynamic characteristics of the servo valve and the liquid flow continuity equation. Next, the adaptive neural network algorithm and sliding mode control technique are effectively combined to realize that the position signal of the hydraulic joint can be tracked along the desired command quickly and effectively. The SMC is adopted for its robustness against the uncertainty and nonlinearity of the target system, whereas the higher-order neural network observer is utilized to compensate parametric uncertainties to improve the accuracy. In addition, the closed-loop asymptotic stability of the designed control strategy is guaranteed by employing the Lyapunov theory. To investigate the tracking performance of the proposed controller, numerical simulation experiments have carried out and the results demonstrated the effectiveness of proposed control scheme.

09:30
Multimodal Interaction Fusion Network based on transformer for Video Captioning
PRESENTER: Hui Xu

ABSTRACT. Learning to generate the description for a video is essentially a challenging task as it involves an understanding of vision and language. Existing methods are mainly based on Recurrent Neural Networks (RNN). Nevertheless, there are some limitations, such as feeble representation power and sequential nature. The transformer-based architecture was proposed to address such issues, and it is widely used in the domain of image captioning. Although it has achieved success in existing methods, the applicability to video captioning is still largely under-explored. To fully explore its significance in video captioning, this paper proposes a novel network by utilizing the transformer for video captioning named Multimodal Interaction Fusion Network (MIFN). To effectively learn the relationship between multiple features, a cross-attention module is introduced within the encoder, which provides a better representation. Moreover, in the decoder, we use a gated mechanism for filtering the essential information to produce the next word. Moreover, we evaluate the proposed approach by using the benchmark MSR-VTT and MSVD video captioning datasets to illustrate its quantitative and qualitative effectiveness and employ extensive ablation experiments to fully understand the significance of each component of MIFN. The extensive experimental results demonstrate that MIFN obtains performance comparable to the state-of-the-art methods.

09:40
A Composite Position Control of Flexible Lower Limb Exoskeleton Based on Second-order Sliding Mode
PRESENTER: Xinyu Zhu

ABSTRACT. Aiming at the problem that the trajectory tracking error of flexible lower limb exoskeleton robot is too large under the condition of external disturbance and parameters uncertainty, a composite position control method based on second-order sliding mode was proposed. Firstly, the flexible lower limb exoskeleton robot is modeled by Lagrange function. Secondly, considering that the system is affected not only by matched disturbance but also by unmatched disturbance, two finite time state observers are used to observe and compensate the two disturbances in real time. In the position control part, the super twisting algorithm is used to ensure the trajectory tracking error of knee joint converging to zero in finite time. Eventually, the stability of the proposed control strategy is proved by the Lyapunov function. The experimental results show that the proposed control strategy has more accurate trajectory tracking effect and robustness than the traditional sliding mode control, which indicates the superiority of the proposed control strategy.

09:50
An image enhancement method based on multi-scale fusion

ABSTRACT. To solve the problem of low contrast and poor visual effect of video capture image in coal mine underground, a method based on multi-scale fusion is adopted to enhance coal mine underground image. Firstly, the improved homomorphic filtering and the contrast-limited adaptive histogram equalization are used respectively for the to obtain an underground image with uniform brightness and a contrast enhanced. Then, the enhanced image is obtained by multi-scale fusion of the two images with above processing. Experimental results show that the algorithm can solve the problem of distortion, and effectively improve the contrast and brightness in the image.

10:00
Face Detection Based on Improved MTCNN in Non-Uniform Light Environment
PRESENTER: Dandan Huang

ABSTRACT. In order to improve the accuracy of face detection under non-uniform low-light conditions, this paper proposes an image adaptive enhancement method based on improved dual gamma correction and an adaptive weight setting applied to MTCNN's face detection algorithm. The algorithm uses illumination compensation theory to perform image enhancement and other preprocessing on face images, and then adds an adaptive module to the MTCNN model, which can significantly improve the accuracy and detection rate of MTCNN's face detection under non-uniform low-light conditions, and the recognition rate Reached 93.58%. Experiments show that under non-uniform low light conditions, this method has better accuracy and robustness than the original MTCNN method, which is beneficial to the later face recognition task.

10:10
Optimal SVM using an improved FOA of evolutionary computing
PRESENTER: Xing Chen

ABSTRACT. To solve the problem that the traditional model has the problem of low recognition accuracy, this paper proposes a dynamic step size factor to realize the dynamic change of step size, which makes the fruit fly optimization algorithm (FOA) have strong local optimization ability and improve the convergence accuracy. The smell concentration determination formula is improved to realize the search of the algorithm in the negative value range. This paper constructs an improved FOA-SVM model and uses the heart dataset to evaluate the classification performance of the improved FOA-SVM algorithm. Finally, the gray texture joint feature parameters of coal and gangue are extracted as the input vector of the improved FOA-SVM classification model, and the classification experiments of coal and gangue are carried out. The experimental results show that the classification results of the improved FOA-SVM model proposed in this paper are good, and the average accuracy are 96.30%, which improves the classification performance by 2.1% compared with the original algorithm.

10:20
Semi-supervised Cross-Modal Retrieval with Graph-based Semantic Alignment Network
PRESENTER: Lei Zhang

ABSTRACT. Semi-supervised cross-modal retrieval is an eclectic paradigm which learns common representations via exploiting underlying semantic information from both labeled and unlabeled data. Most existing methods ignore the rich semantic information of text data and are unable to fully utilize the text data in common representation learning. Moreover, they only considered the correlation of the data with the same semantic label, but ignored the correlation between the data with different semantic label. In this paper, we propose a novel semi-supervised cross-modal retrieval method, called Graph-based Semantic Alignment Network (GSAN), which learns common representation by aligning the features of different modalities with semantic embeddings of text data. Firstly, we design a Deep Supervised Semantic Encoding (DSSE) module to train the semantic projector and label predictor which can exploit the semantic embeddings and the predicted labels from unlabeled data of text modality. Then, GAN-based Bidirectional Fusion (GBF) module is designed to learn the mapping networks of two modalities (image-text). In order to make the mapping networks generate discriminative and modality-invariant common representations, we utilize the underlying semantic information exploited by DSSE to construct Graph-based Triplet Constraint (GTC) which can enforce feature embeddings from the semantically-matched (image-text) pairs to be more similar and push those mismatched ones away. By the benefit of fully using of semantic information, our approach can only use fewer label data and achieves the performance of state-of-the-art methods. In addition, since we only utilize the mapping networks trained in GBF module to generate common representations in referring stage, our approach is efficient and time saving in real world application. Extensive experiments on four widely-used datasets show the effectiveness of GSAN.

10:30
Sliding Mode Control For Manipulator Based On Non-linear Disturbance Observer
PRESENTER: Jing Bai

ABSTRACT. In order to solve the problem of the difficulty of accurate trajectory tracking when the multi-joint robotic manipulator is affected by disturbances, a sliding mode control strategy based on a non-linear disturbance observer is proposed in this paper. Firstly, a non-linear disturbance observer is designed to accurately estimate the complex uncertainty disturbance signals generated by the modeling errors of the dynamics of the robotic manipulator system, joint friction and external disturbances during the control process. Secondly, the sliding mode controller is designed for the high non-linearity and high coupling characteristics of the robotic manipulator system, and the stability of the system is proved by constructing a Lyapunov function. The control law is designed while the disturbance estimation term is added for compensation. Finally, the simulation results show that the proposed method can minimize the influence of disturbance signals on the system, enhance the robustness of the system and improve the precision of the target trajectory tracking.

10:40
Content Extraction of Chinese Archive Images via Synthetic and Real Data
PRESENTER: Xinning Li

ABSTRACT. With the development of the information age, the awareness of archives informatization of governments, organizations and groups at all levels is constantly increasing. The key content extraction of scanned archive images has been widely concerned by researchers. Based on the deep learning method, we extract text and key elements (ID photo, seal, date of birth) by analyzing the archive images. Based on the real archive images data set and synthetic archive images data set, we use the PP-OCR[1] model to finish the text structured extraction and YOLOv5[2] to train a target detection model of ID photos, seals, and dates of birth to achieve the extraction of key elements. Since the real data set cannot be made public, we use ordinary archive data for explanation.

10:50
An Active Robotic Detumbling method based on Deep Reinforcement Learning for Non-cooperative Spacecraft
PRESENTER: Hao Gao

ABSTRACT. Active contact detumbling method using a brush-type contactor is an effective way for damping the angular momentum of rotating target in order to make it easy to capture. In this paper, we propose a new brush-type detumbling method base on deep deterministic policy gradient (DDPG) for achieving the autonomous detumbling trajectory planning and control strategy of robot arm for the first time. Considering the characteristics of brush contact, we propose a recommended rac-emization point as a state space input based on the relative position between the robot arm base and the outer corner point of the solar panel of non-cooperative target to avoid the curse of dimensionality. The rewards are designed to associate with the recommended point and continuous damped collisions. Then the agent is trained to learn to find the racemization point and generate smooth trajectory ac-tion that makes the reward bigger. The simulation scenario is built based on the Mujoco physics engine, including the target, air bearing table and the capture spacecraft carrying the UR10 arm with a flexible brush. Simulations are conduct-ed to evaluate the performance of the proposed detumbling policy. Autonomous path planning and smooth detumbling control of robot arm are realized by using DDPG. The robot arm can track the motion of target panel and automatically se-lect the optimal detumbling point. During a collision period, the joint angle changes smoothly, so that continuous damping is applied to the target to obtain the maximum damping effect. The method proposed in this paper can provide a new control strategy for active contact detumbling application.

11:00
A Bilateral Controller For Pharyngeal swab Test Teleoperation System
PRESENTER: Liang Li

ABSTRACT. Due to the novel pneumonia virus outbreak and gradual normalization, throat swab detection has become an important means for customs entry and exit personnel. However, the current detection methods make the medical staff exposed to the dangerous environment for a long time, which is very easy to cause the infection of medical staff. In view of the relative shortage of medical staff, this paper proposes to use the teleoperation system for pharyngeal swab detection, and puts forward the corresponding bilateral teleoperation controller. Finally, its feasibility is verified by numerical simulations.

11:10
Enhanced Feature Fusion and Multiple Receptive Fields Object Detection
PRESENTER: Hailong Liu

ABSTRACT. CenterNet is a widely used single-stage anchor-free object detector. It only uses a single feature to detect all the objects. In this paper, we present an enhanced feature fusion and multi receptive field object detector, named EM-CenterNet. Our detector merges feature maps of the same spatial size from the bottom-up pathway and the top-down pathway through lateral connections, and the semantic information transfer path is used to transfer the semantic information of high-level features to low-level features, to enrich the semantic information contained in the last output feature. Besides, we design another key component, which is composed of continuous several dilated convolutions and shortcut connections, so that our detector can cover all object’s scales. This paper compares the EM-CenterNet method with the baseline on the Pascal VOC dataset and MS COCO dataset for training and testing. Experiments show that our method increases the AP by 12.2% on the Pascal VOC dataset, and increases the AP by 5.9% on the MS COCO dataset.

11:20
A Novel Objcet Detection Method Combining Self-Attention Mechanism and RetinaNet

ABSTRACT. RetinaNet is one of the machining learning-based object detection method, which can enhance the detection accuracy, and has high detection efficiency. However, only using the traditional method of Convolutional Neural Network (CNN) to get the target characteristics has a certain translation invariance, which will reduce the positioning ability of the model. In our work, our present a Self-Attention Mechanism (SAM) combined with differential evolution into the RetinaNet model, and based on which proposes an improved object detection method. The self-attention mechanism can effectively improve the performance with CNN from those target pictures, better capture the feature correlation in the target pictures, and further perfect the detection precision of the method. On this basis, applying the Differential Evolution algorithm to optimize the size and ratio of the anchor in the dataset, can both improve the detection accuracy and the detection efficiency. A comparative test is enforced on the dataset of PASCAL VOC dataset. The experiment shows that our proposed improvement has increased the mAP (mean Average Precision) by 7.5% for the RetinaNet modal, achieving 87.4% on the PASCAL VOC2012 test set.

11:30
Learning Visual Tempo for Action Recognition
PRESENTER: Mu Nie

ABSTRACT. The variation of visual tempo, which is an essential feature in action recognition, characterizes the spatiotemporal scale of the action and the dynamics. Existing models usually use spatiotemporal convolution to understand Spatiotemporal scenarios. However, they cannot cope with the difference in the visual tempo changes, due to the limited view of temporal and spatial dimensions. To address these issues, we propose a multi-receptive field spatiotemporal (MRF-ST) network in this paper, to effectively model the spatial and temporal information. We utilize dilated convolutions to obtain different receptive fields and design dynamic weighting with different dilation rates based on the attention mechanism. In the proposed network, the MRF-ST network can directly obtain various tempos in the same network layer without any additional learning cost. Moreover, the network can improve the accuracy of action recognition by learning more visual tempo of different actions. Extensive evaluations show that MRF-ST reaches the state-of-the-art on the UCF-101 and HMDB-51 datasets. Further analysis also indicates that MRF-ST can significantly improve the performance at the scenes with large variances in visual tempo.

11:40
Research on the identification method of audiovisual model of EEG stimulation source
PRESENTER: Zhaoxiang Lu

ABSTRACT. For the cloud center of the mining industry, one of the most important tasks is to obtain the special environment safety state information because the hidden dangers of the water and fire, as well as the gas, always threaten human life and the production.In addition, audiovisual model of EEG stimulation source is integrated by using the convolutional neural network model with the inception network. It fuses the EEG information with the additional environment sensor information to increase the environment safety classification accuracy.Through brain network visualization and brain connection, we can get the density and weight change of global network connection in different audio-visual stimulation stages, which makes the analysis of EEG signals more intuitive. The experimental results indicate that the environment safety recognition accuracy of the audiovisual model can respectively reach 87.98 %, 88.4 %, and 90.12 % for the single visual, the single auditory, and the audiovisual stimulus. The audio-visual modality has the best performance for the audiovisual evocation than the single visual or auditory stimuli.

11:50
Joint Semantic-Instance Segmentation Based on Multi-Scale Feature Extraction Network
PRESENTER: Jintong Cai

ABSTRACT. In recent years, it is common to extract semantic and instance feature matrices through feature extraction network and merge them to improve the segmentation effect for 3D point cloud segmentation tasks. However, this implicit processing method has high requirements on the feature extraction network, and the fineness of the feature will directly affect the final segmentation result. In this paper, we propose a new feature extraction network for segmentation by adding the encoder-decoder structure, which can extract the multi-scale local feature information from the feature map. In our opinion, the multi-scale features are merged to obtain a better feature matrix, which improves the performance of the segmentation tasks. With the evaluation from S3DIS dataset, our new feature extraction network greatly improved in semantic segmentation and instance segmentation tasks.

12:00
Deep Learning-based Virtual Trajectory Generation Scheme

ABSTRACT. At present, location based service (LBS) has developed rapidly, it is extensive use in the applications of various intelligent mobile terminals. the trajectory privacy of users is protected by virtual trajectory generation algorithm constructed by statistical method. Since the user's motion model is a complex equation, it is difficult to model it mathematically, because the user's trajectory model does not consider its motion model, limiting the formation of the trajectory. As a result, previous virtual trajectory generation algorithms were not resistant to deep learning-based data mining attacks. In this paper, real and virtual trajectory discriminators are designed using LSTM (Long Short-Term Memory) technology, and a deep learning-based virtual trajectory generation scheme is proposed. Experiments show that the false trajectory can be identified with a success rate of at least 96%, while for the real trajectory, the false positive rate is only 6.5%. The virtual trajectory generated by the proposed scheme has human motion patterns similar to the real trajectory, and protects against colluding, inference, and channel attacks. The generated virtual trajectory points will not be distributed in the map inaccessible areas. On the premise that the users obtain the service quality, the user trajectory privacy is effectively protected to reduce the loss as much as possible.

12:10
Overview of Robotic Grasp Detection from 2D to 3D
PRESENTER: Zhiyun Yin

ABSTRACT. With the wide application of robots in life and production, robotic grasping is also experiencing continuous development. However, in practical application, some external environmental factors and the factors of the object itself have an impact on the accuracy of grasping detection. There are many classification methods of grasping detection. In this paper, the parallel gripper is used as the end of grasping to carry out research. Aiming at the angle problem of robot grasping, this paper summarizes some research status of grasping detection from 2D image to 3D space. According to their respective application, advantages and disadvantages, this paper analyzes the development trend of the two methods. At the same time, several commonly used grasping datasets are introduced and compared.

12:20
Robust Audio Dual Watermarking Algorithm based on Melody Feature
PRESENTER: Xiaoman Dou

ABSTRACT. With Internet rapid development, the release and dissemination of music products are becoming more and more convenient. At the same time, the phenomenon of random forwarding or modification of music production occurs from time to time. The market needs reliable methods of copyright protection and integrity verification. However, the available copyright protection audio watermarking algorithms are less effective against synchronous attacks and lack audio integrity verification. According to the characteristic that melody is robust to common signal processing and sensitive to malicious attacks, a robust watermarking scheme based on melody feature is proposed in this paper. In this scheme, melody features and copyright information are combined to construct a watermark that can be used for integrity verification. And then two robust watermark embedding algorithms based on DCT-SVD and histogram are adopted to embed the generated watermark into the host audio in a complementary form, which is used to resist different attacks and realize copyright protection. At the same time, tamper detection and location can be realized by comparing the melody features. In the extraction process, two groups of watermarks are extracted by corresponding methods, and two copyright images can be obtained. The Brenner function of these two images is calculated, and the image with smaller Brenner value is selected as the final watermark image. The experimental results show that the proposed watermarking scheme has strong robustness to common signal processing and synchronization attacks. Through the melody feature, the tampering position can be accurately identified, and the audio integrity verification is realized.

12:30
Design of MobileNetV1 SSD target detection accelerator based on FPGA
PRESENTER: Luojia Shi

ABSTRACT. Object detection based on convolutional neural network has become one of the important algorithms in the field of computer target detection. However, due to the speed and power limitation brought by convolution computation, the algorithm of the convolutional network used for target recognition generally needs to be accelerated by the convolution accelerator before it can be effectively deployed to edge computing devices. In this paper, a new architecture of the convolution accelerator is proposed. On this basis, the convolution accelerator is used to build and complete the convolutional acceleration system, and the MobileNetV1 SSD detection algorithm network is realized. The convolution accelerator has a special operation channel, which can normalize convolution kernels of different sizes. Besides, it innovatively optimizes the addition operation mode in the convolution kernels, realizes pipeline convolution calculation, shortens the image recognition time and increases the versatility of the convolution accelerator.

12:40
Transformer-Based Point Cloud Classification
PRESENTER: Xianfeng Wu

ABSTRACT. In this paper, we propose a transformer-based point cloud classification method. We introduce different transformer modules into the three key networks of PointNet, to improve the discriminability and the stability of features extracted at different stages. Experimental results show that compared with the PointNet, our method has not only a higher classification accuracy but also a higher stability, especially when the points in the point cloud are extremely few.

12:50
Underwater Object Tracking Based on Error Self-correction
PRESENTER: Qinxu Jia

ABSTRACT. Underwater object tracking is more challenging than tracking on land due to the image distortion. Underwater illumination changes, motion blur and other problems will greatly degrade the performance of various tracking methods, especially when using underwater cameras. To overcome these limitations, we proposed an underwater tracking method based on error self-correction. We combine the SiamFC tracker and correlation filter trackers into one framework. In this framework, the correlation filter tracker is used to guide the update of SiamFC tracker. In addition, this framework contains a multi-peak suppression module, which comprehensively improves the tracking accuracy of our tracker in underwater scenarios. Experimental results on the underwater dataset we established demonstrate that the success rate of the AUC criterion is 68.2%, which has a 11.4% improvement over the baseline tracker. Also, our proposed algorithm achieves a 12.6% improvement over the conventional correlation filter tracker BACF, especially in challenging scenarios.

13:00
Novel Elimination Method of Baseline Drift Based on Improved Least Square Method
PRESENTER: Ruhao Zhang

ABSTRACT. With the development of brain computer interface (BCI),the application of bioelectrical signals in the field of intelligent medical devices has been flourishing. Currently, how to remove the interference in the electrical signal and improve the recognition accuracy has attracted great attention. In the process of electromyographic (EMG) signal acquisition, baseline drift is a serious issue, which can affect the signal recognition accuracy, the traditional least square method (LSM) cannot remove the filtered baseline drift component within the window. To address this issue efficiently, a modified least method is designed in this paper, which employs a polynomial fit to remove the baseline drift component within the window by the curvature of the polynomial. The designed method can not only retain the advantages of the LSM in terms of small operation size, but also improve the baseline drift removal capability, providing a solution for a high-precision embedded bioelectric signal acquisition device. Experimental results show that the improved least square method (ILSM) improves the baseline drift removal capability by about 5% over the LSM. In addition; In addition, compared to LSM, the ILSM can reduced the number of window openings.

13:10
Point Cloud Driven Object Classification: A Review
PRESENTER: Zhongyuan Lai

ABSTRACT. The rapid development of 3D sensors makes it very easy to obtain large amounts of point cloud data. Point cloud data can reflect the rich real world in real time and is widely used in face recognition, intelligent transportation and other fields. As the basis of computer vision, point cloud classification technology has received extensive attention. The application of deep learning further improves the accuracy and robustness of point cloud classification. This paper first introduces the ModelNet40 data set and evaluation indicators in detail. Then, two methods for directly processing point cloud data are introduced, namely point-based method and graph-based method. After that, the basic ideas and improvement points of representative methods in each category are introduced, and the error sources of these methods are analyzed. Then the experimental results are analyzed. Finally, the future of point cloud classification is prospected.

13:20
A target detection model based on multi-scale feature fusion
PRESENTER: Yibo Sun

ABSTRACT. Target detection technology identifies and locates instance targets in a given image or video. It is one of the critical topics in computer vision and promotes the development of intelligent monitoring, face recognition, and image segmentation. At present, the object detection model based on deep learning has a low detection accuracy of small-scale objects. In this paper, we propose a target detection model MS-Faster R-CNN based on multi-scale feature fusion. We adopt an optimized multi-scale feature fusion on the basic architecture of Faster R-CNN. The strategy, combined with the FPN pyramid feature extraction method, uses two links to complete the feature fusion, which makes the semantics of the fused features richer and can be adapted to different scale targets. In the candidate box recommendation stage, we use the cascaded RPN and the optimized NMS method so that the candidate box of the small-scale target would not be over-suppressed. So the recommendation efficiency of the candidate box will be improved. Finally, the ROI Align pooling technology based on the bilinear interpolation method is adopted to avoid the loss of accuracy caused by quantization.

13:30
A New Direct Acyclic Graph Task Scheduling Method for Heterogeneous Multi-Core Processors
PRESENTER: Feng Xiao

ABSTRACT. At present, the traditional methods of calculating the worst response time upper bounds for direct acyclic graph (DAG) task scheduling on heterogeneous multi-core platforms have problems such as nonself-sustainability and too many blocking nodes, which both cause the response time upper bound to be pessimistic. We propose to reconstruct the DAG graph by adding execution edges in order to eliminate blocking nodes. The specific method is to first use triples to represent each node of the DAG task graph, then to select blocking nodes according to the priority rules with which we add execution edges graph to reconstruct it, and finally to remove the blocking nodes again and repeat the process until all blocking nodes have been eliminated. Our experiments show that the worst response time upper bound from this method can achieve a 20% improvement in accuracy compared to the traditional methods for calculating the worst response time upper bound.

13:40
Feature Analysis of Electroencephalogram Signals Evoked by Machine Noise
PRESENTER: Hongbin Wang

ABSTRACT. Recently, people who have been working in mines for a long time are affected by the noise of high-power coal mining machines. The noise of machines for a long time may have a great influence on people's physical and mental health. In order to further understand the psychological state of workers working in the mine, and to extract features and analyze the change rule of EEG signals induced by machine noise stimulation. In this paper, the improved singular value decomposition and reconstruction algorithm is used to process the EEG signal artifact. The improved signal-to-noise ratio is increased from 9.462 to 11.063, the root mean square error is increased from 4.435 to 4.232, and the correlation coefficient is improved to 0.813. Through experiments, the signal generated by human brain stimulated by machine noise is collected and analyzed after filtering. The results show that the electroencephalography (EEG) of F3, F4 and other electroencephalography (EEG) sensitive electrode potentials tend to be adaptive to a stable state when the frequency of machine noise is different.

13:50
L-FPN R-CNN: An Accurate Detector for Detecting Bird Nests in Aerial Power Tower Pictures
PRESENTER: Guolin Dai

ABSTRACT. Today's power inspection bird's nest detection brings difficulties to object detection due to problems such as few object samples, similar backgrounds, and occlusion by towers. In this paper, an improved Sparse R-CNN model L-FPN R-CNN is proposed. Firstly, an ex-grid initialization method is proposed for learnable proposal boxes in Sparse R-CNN to improve the positioning accuracy of proposal boxes. Then, the SiLU function is introduced into the dynamic detection head of the original Sparse R-CNN network, which makes the network have stronger convergence ability. Finally, a feature pyramid structure L-FPN that fuses low-level features is proposed to improve the richness of feature maps. The experimental results show that the average accuracy of the algorithm is 2.9% higher than the original algorithm, and it has a good detection effect on the bird's nest in the aerial electric power tower image.

14:00
Predictive Information Preservation via Variational Information Bottleneck for Cross-view Geo-Localization
PRESENTER: Wansi Li

ABSTRACT. Cross-view geo-localization task, which is to match two images with the same geographic target, but from different platforms, e.g., satellite-view and drone-view, has received significant attention in recent years. However, this research is impeded by the large visual appearance changes across different views and irrelevant content contained in the background. Previous work mitigates the geo-view gap by deploying a polar coordinate transformation to the aerial images or utilizing rich contextual information near the target as auxiliary information. Despite some promising breakthroughs made by such methods, they fail to consider the involvement of irrelevant features retained in the high-dimensional features, which reduces the accuracy of the retrieval result. In this paper, we propose a simple and efficient model termed \textit{Predictive Information Preservation Bottleneck} (PIPB), using the variational information bottleneck to discard the irrelevant information and retain the predictive information, enhancing the robustness and generalization capability of the model. In particular, our proposed PIPB consists of two stages. At the first stage, we learn the part-based features of each image to make full use of neighbor clues, which is realized by the square-ring partition strategy. Then, at the second one, these learned representations are fed through the variational information bottleneck module to filter out superfluous information. This step can promote the robustness and generalization of our model and improve experiment performance. Extensive experiment results on the recently-released dataset University-1652 and the fundamental benchmark CVACT demonstrate the superiority of our PIPB approach compared to other state-of-the-art methods.

14:10
Brain Modeling For Surgical Training On the Basis Of Unity 3D
PRESENTER: Fengxin Zhang

ABSTRACT. Virtual surgical simulation trainings have advantages of repeatability, convinient operation, strong immersion, etc. It can greatly reduce the cost of surgical trainings and decrease the risk of surgery operations. In practice, virtual soft tissue models require real-time generation of deformations during the interaction, so as to provide force feedback. In this process, it requires that the rationality of the model deformation may be fully considered as building the model. Mass Spring Model is widely used in soft tissue modeling and deformation simulations due to its simple structure and high efficiency. In order to make the virtual surgical simulation more realistic and accurate, via the Unity3D software, a mass-spring physics model is established based on the biomechanical characteristics of soft tissue in this paper. Newtonian classical mechanics is used and the virtual brain modeling and collision are performed by numerical simulations. The platform is constructed using Unity 3D and C# software. Results show that our model may accurately reflects the deformations of soft brain issue.

14:20
POFC: A method of face changing in Peking Opera

ABSTRACT. When using standard neural network transfer methods in portrait style transfer, there is often a problem that semantically correct transfer can not be guaranteed; they tend to ignore the texture details in the style examples, leading to undesired results. This paper proposes a style transfer method for Peking Opera masks that does not require long training (no GPU is required to participate in the calculation, it takes about 10 minutes to run on the CPU) and does not require large datasets (only requires dozens of pictures). This method can transfer the style of a portrait with Peking Opera masks to another portrait. This method uses two guides: a position guide and an appearance guide. Position guide ensures semantic consistency in the transfer process. Appearance guide ensures that the appearance of the target subject can be preserved during transfer. Experiments show that our method can quickly and perfectly transfer the style of a Peking Opera face portrait to another portrait. For more information, please visit: http://47.104.215.227:88/

14:30
Motion Saliency detection based on Drosophila vision-inspired model
PRESENTER: Meng Zhou

ABSTRACT. Drosophila vision is extremely sensitive to moving targets and color opponency, which provides rich biological enlightenment for the study of computer vision. Drosophila vision has been extensively studied in various aspects, but our understanding of the underlying neural computation remains poorly understood. We propose a Drosophila visioninspired model that constructs a complete visual motion perception system and a color processing system by integrating continuous computing layers to gain insight into the neural mechanisms of Drosophila vision. Drosophila visioninspired model can be used for saliency detection in dynamic scenes, especially in some scenes where the color distinction is significant, it can accurately identify the motion of interest (MOI) while suppressing background interference and self-motion because our model depends on the motion perception and color opponency based on the Drosophila vision. Experiments on two large-scale video saliency detection datasets demonstrate the superiority of our model in saliency detection compared with the state-of-the art methods.

14:40
Exploring the Potential of Facial Physiological Signature on Happiness Detection
PRESENTER: Min Hao

ABSTRACT. A facial physiological signature (tissue oxygen saturation-StO2) can be measured non-invasively through Hyperspectral imaging technology (HSI). Our prior stud-ies show the significant differences among the variations of StO2 for different fa-cial regions of interest (ROIs) when human emotion changes. In this research, 19 ROIs have been determined through a rigid exploration of musculoskeletal anat-omy and facial action coding system (FACS). Then a 19 dimensional StO2 vector extracted from the above ROIs, instead of using the entire facial signature, is formulated as the feature input. During the experimental investigation, a total of fifty participants participate in the process and are also required to express their spontaneous emotions (calm, happy, and unhappy: angry). Finally, substantial experiments all achieve better acceptable recognition levels, demonstrating the po-tential effectiveness of using facial StO2 as an affective indicator to detect human emotions.

14:50
Research on matching mechanism and route planning of intercity carpool
PRESENTER: Qunpeng Hu

ABSTRACT. In recent years, intercity carpool has become more and more popular among people due to its convenience and low price, it has gradually become one of the important ways of intercity transportation. The ride-sharing platform provides functions of information interaction for passengers and drivers, completes the allocation of transportation tasks and recommends the optimal route planning. Existing ride-sharing platforms fail to take user's personalized needs into account when assigning tasks, and users are not satisfied with the planned routes. This paper designs an allocation algorithm (Allocation Algorithm 4 Inter-city Carpool ) for intercity carpool and proposes a pricing function related to the detour distance and user’s satisfaction, so as to ensure the highest benefits for ride-sharing platforms and drivers as well as the highest passenger satisfaction. The AA4IC algorithm is proved to be incentive compatible and budget balanced theoretically, and the effectiveness of allocation scheme generation and path planning is verified by experiments. When the algorithm is iterated 1000 times, the time is less than 200s, and the task assignment under the optimal user satisfaction can be achieved.

15:00
Scalable and auditable self-agent pseudonym management scheme for intelligent transportation systems
PRESENTER: Ming Yang

ABSTRACT. Pseudonym is the basic method of intelligent transportation system privacy protection, but a single pseudonym scheme cannot effectively protect the lo-cation privacy of vehicles. Using replaceable multi-pseudonym scheme can better protect the location privacy of vehicles. At present, multi-pseudonym schemes mainly include preloading and on-demand. The validity period of preloading scheme is fixed, the replacement is limited, the pseudonym waste is serious and the storage overhead is huge. The on-demand scheme needs roadside unit or pseudonym certificate authority assistance, with large delay, poor security, hidden dangers of sybil attack and other problems. This paper proposes a massive pseudonym management scheme based on certificateless signature and pseudonym self-agent generated by vehicle. The scheme meets all the requirements of conditional pseudonym authentication. Through the self-agent generation of short-term pseudonyms, the system can realize the efficient management of multi pseudonyms of millions of vehicles. The sup-ply of pseudonym needs to be authorized by the pseudonym management authority, avoiding the abuse of pseudonym and sybil attack. In this scheme, the pseudonym log server is used to provide the transparency of the man-agement. For pseudonym revocation, we adopt the whitelist-like method without using the traditional certificate revocation list. We have carried out security analysis and performance evaluation on this scheme, and found that this scheme has good scalability and supports the transparency of manage-ment.

15:10
CROSS-LAYER FEATURE ATTENTION MODULE FOR MULTI-SCALE OBJECT DETECTION

ABSTRACT. Recent target detection networks adopt the attention mechanism for better feature abstraction. However, most of them draw feature attentions from merely one or two layers, failing to obtain consistent results for objects with different scales. In this paper, we propose a cross-layer feature attention module (CFAM) which can be plugged in any off-the-shelf architecture, and demonstrate that attentions obtained from multiple layers can further improve object detection. The proposed module consists of two components for cross-layer feature fusion and feature refinement, respectively. The former collects rich contextual cues by fusing the features from distinct layers, while the later calculates the cross-layer attention maps and applies them with the fused features. Experiments show the proposed module improves the detection rate by 2% against the baseline architecture, and outperforms recent state-of-the-art methods on the Pascal VOC benchmark.

15:20
A semi-supervised remote sensing image road segmentation algorithm based on SegFormer
PRESENTER: Runtao Xi

ABSTRACT. Aiming at the problem that pixel-level annotations of remote sensing images are difficult to obtain, this paper proposes a semi-supervised road segmentation method for remote sensing images. First, an unsupervised network is proposed to generate pseudo-labels for road images,This module uses the super pixel segmentation method to pre-segment the roads in the remote sensing images, and then the lightweight convolutional neural network extracts the road feature information, and optimizes the results of the super pixel segmentation to generate pseudo-label images; For the problem that the difference between the number of front and rear pixels in the remote sensing road image is difficult to accurately segment, the loss function of SegFomer is improved, and finally the pseudo-label image and the original image are combined into the improved SegFormer network for training. Experiments show that the segmentation effect of the method proposed in this paper is better than that of PSPNet, HRNet and other methods, and a good segmentation effect is achieved.

15:30
Medical pathways models mined by complex healthcare logs
PRESENTER: Jie Xue

ABSTRACT. Obtaining medical pathways from a large number of medical logs has become a current research hotspot. In this article, we proposed a method that combines trace clustering, process discovery and neural network to discover medical pathway models from complex medical logs. The source medical logs were structured as XES event logs first. Cases with similar medical behavior were aggregated by trace clustering. Use process mining to generate process models. Extract reasonable medical pathways from the process models. Neural network was used to determine the proportional characteristics of medical pathways. Combine the above to form a usable medical pathway model. The results of the experiments show that the average simplicity of the generated process model is 0.695, the average accuracy of the neural network models is 93.44\%, and the medical pathway model score is about 0.879.

15:40
A Study on Japanese Text Multi-Classification with ALBERT-TextCNN
PRESENTER: Zepeng Zhang

ABSTRACT. Text classification is an essential task in the domain of natural language pro-cessing (NLP), which involves assigning a sentence or document to an ap-propriate category. This paper mainly focuses on using ALBERT-TextCNN for Japanese text classification. First, the data files from Japanese Wikipedia pages are collected and then divided into 31 categories. Next, the ALBERT-TextCNN model for Japanese text classification is built with two steps: 1) se-lect the ALBERT model as the pre-training model; 2) use TextCNN to fur-ther extract semantic features from texts. We conducted experiments to compare the ALBERT-TextCNN model using the Sentencepiece tokenizer with other state-of-the-art models. The results show that the performance is improved by about 14.5%, 11.6%, 13.8%, and 13.3% in value evaluation metrics like Accuracy, Precision, Recall, and F1, which shows that the ALBERT-TextCNN model can be used to classify Japanese text effectively.

15:50
An Automated Method of Creating Lunchbox Image Datasets Using a Novel Clustering Algorithm
PRESENTER: Kohki Hayakawa

ABSTRACT. Meal assistance robots have been developed because people with upper limb disabilities have difficulty in eating by themselves. We develop a robot to automatically select and assist food by machine learning to operate more easily. This machine learning requires the creation of high-quality datasets for each type of food. In this paper, we propose the automatic improved method by using Density-Based Spatial Clustering of Applications with Noise repetitively to remove noisy images in the dataset. Experimental results show that the percentage of noise images in the dataset was reduced by 20%. In this way, we hope that the accuracy of automatic selection implemented in meal assistance robot is improved.

16:00
Contrastive Prediction and Estimation of Deformable Objects Based on Improved Resnet

ABSTRACT. Because of the complexity of dynamic models for deformable linear objects, learning from visual models is difficult and featuring information extraction is insufficient. Therefore, we propose a joint visual representation model using contrast learning of optimized the encoder. We focus on the design optimization of the encoder structure, add the residual structure to the encoder, optimize the extraction and compression of its feature information, and control its parameters to 3 million. Compared with other methods, the parameters of our encoder are reduced. Simultaneously,adding the residual structure can obtain not only good feature information but also good feature information extraction efficiency. In the rope experiment, we put the rope in different initial states to its required shape through the training model; we collect information from the simulated environment without manual marking, extract features through the encoder and transmit them to the downstream task. Experiments show that it improved the evaluation of our model at 135° and 45° by approximately 50%. Moreover, because of the modular operation. It is more suitable for industrial operation requirements of 4.0. It has the characteristics of easy transfer and replication. This is not only the improvement of efficiency but also the improvement of manipulator deformable object operation applied to practical engineering.

16:10
Face Recognition Based on Inverted Residual Network in Complex Environment of Mine

ABSTRACT. The recognition of personnel entering and leaving the mine is an important link to ensure safe production. As an effective identity recognition technology, face recognition has been widely and deeply studied while facing the problem that the recognition rate is not high in the complex and harsh environment of mine such as facial expressions, pose variation, and low-resolution of face images. For effectively improving the face recognition rate of miners in uneven illumination environment, a face recognition method based on inverted residual network is proposed. In this method, through the optimization of the activation function, the amount of calculation can be greatly reduced while keeping the almost equivalent performance. And by the fusion of inverted residual network, the problem of partial feature information loss in face image recognition model training is effectively solved, which greatly improves the accuracy of recognition. The experimental results show that the accuracy of the inverted residual face recognition model is 81.4%, which is 5.7% higher than the residual network algorithm with additional 4.3% of time overhead, and 9.9% higher than the MTCNN model with only the 1/13 recognition time of MTCNN.

16:20
Emotion-Sentence-DistilBERT: A Sentence-BERT-based distillation model for text emotion classification

ABSTRACT. Text emotion classification is a hot research area in natural lan-guage processing, aiming to classify the human emotion into positive and negative categories based on words. As a well-established and widely used neural model, BERT has achieved many state-of-the-art results in various natural language processing tasks, including text emotion classification. However, the embedding of sentences from BERT has been proved to be in-sufficient in semantic representation, which we believe is especially crucial for building the emotion classification models. Another issue about employ-ing BERT in text emotion classification is the model complexity, which has limited its training and application in natural human-computer interaction for emotional robotics. In this paper, we propose a novel Emotion-Sentence-DistilBERT (ESDBERT) model, which explores the rich emotional represen-tation in sentences via a Siamese Network based Sentence-BERT module and further reduces the model complexity through a Knowledge Distillation pro-cess. Experimental results suggest that the proposed model can learn a rich emotional representation and to render a promising accuracy for text emotion classification compared with the undistilled BERT-based models.

16:30
Stock Price Prediction in Chinese Stock Markets Based on CNN-GRU-Attention Model
PRESENTER: Jinchao Qi

ABSTRACT. Stock price prediction is a hot topic and has attracted the sufficient attention of both regulatory authorities and financial institutions. Because the fluctuation of stock prices is the result of many different factors, it is not easy to make stock price prediction. Traditional prediction solutions are mainly using simple linear models based on statistical and econometric models, these solutions are difficult to support nonstationary time series data. With the development of deep learning, some newly models can not only support non-linear data, but also retain useful information for better forecasting the stock prices. This paper aims to construct a CNN-GRU-Attention based model for price prediction in Chinese stock markets. First, the convolutional and pooling layers of CNN are used to extract features of factor correlation information from the input data; then, the output of feature matrix is used as input for the GRU model to forecast correlation; finally, the Attention mechanism is used to focus on the important characteristics of stock prices and optimize model structure. We collect multi-dimensional stock data of the China SSE 50 index from 2011 to 2021 as our dataset and conduct a set of experiments to compare the performance, which measured in terms of their Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), R squared (R2) score and Mean Absolute Percentage Error (MAPE). The experimental results show that the CNN-GRU-Attention based model outperforms state-of-the-art approaches in forecasting stock price in Chinese stock markets.

16:40
Target-driven Autonomous Robot Exploration In Mapless Indoor Environments Through Deep Reinforcement Learning
PRESENTER: Wenxuan Shuai

ABSTRACT. In this paper, we present a deep reinforcement learning(DRL)-based autonomous end-to-end system for wheeled robots in an unmapped environment. Potential Points (PPs)are obtained along the way towards the global target for possible better navigation directions. Based on the available data, we use a novel heuristics function to evaluate and select the optimal waypoint. Following the waypoints, the robot is guided towards the global goal. A local navigation system based on DRL is developed to generate the motion policy that guide the robot move between waypoints and towards global goal. The Proximal Policy Optimization algorithm and long short-term memory form the basic foundation of the DRL network. The combined data of the laser readings and the robot’s relative position to the target point serve as inputs. The linear and rotational velocities of the robot are the outputs. A special reward system is created to steer the robot away from dynamic impediments and to maintain a smooth trajectory. a long short-term memory architecture is used to alleviate the local optimum problem and help avoid obstacles out of the current range of sensors. Experiments demonstrate that the proposed method, which does not rely on a map or prior knowledge in complicated static as well as dynamic situations, has an advantage over similar exploration methods.

16:50
Background Subtraction Based on Visual Saliency
PRESENTER: Hongrui Zhang

ABSTRACT. Visual salience plays a significant role in the process of real life to quickly ex-tract necessary information from complex scenes. The state-of-the-art back-ground subtraction algorithms impose restrictions on various reality condi-tions, such as dynamic background and lighting, which will lead to misjudg-ment and false detection. To alleviate the problem that dynamic background is hard to be compensated, this paper proposes a background subtraction method based on visual saliency, which can fully release the advantages of human vis-ual saliency. Experimental comparison is firstly conducted to confirm that the visual salient region is a potential moving target area. Then feature corner samples is extracted from this area, and sparse optical flow method is adopted to determine the moving target in the current frame. To further enhance the perception ability of moving target in dynamic background, an improved vibe method based on partially random background updating strategy is developed to realize accurate moving object extraction. In addition, the results of motion perception are purposefully used for background set updating. Experimental results indicate that the proposed model can effectively extract the moving target.

17:00
Image Rectification of Industrial Equipment Nameplate Based on Progressive Probabilistic Hough Transform
PRESENTER: Han Li

ABSTRACT. . In this paper, we put forward an industrial nameplate picture correction method based on Progressive Probabilistic Hough Transform. Our method can effectively correct the image tilt caused by the wrong shooting direction. Even the oblique images taken from a long distance have certain effects. We also introduce the Mining Equipment Nameplate Dataset. The frame of the industrial nameplate is quadrilateral. The two sides of the nameplate border in the photo will cross each other after being extended. This result is caused by the tilt of the shooting angle. Our method firstly grays the picture. Then binarizes the image and Gaussian smoothing filter. We use the Progressive Probabilistic Hough Transform to locate the two longest line segments in the picture. The four endpoints of the two line segments are the four endpoints of the quadrilateral. Finally, the correct picture is obtained by perspective transformation. Our method makes the nameplate text more visible, and the detection method is fast and effective. The pictures obtained by experiments are clearer and easier to observe. In the second half of the article, we list some experimental results. Our method can well handle the requirements in actual production.

17:10
ListMLE-based Personalized Teaching Resources Recommendation for Online-to-offline Hybrid Course Under Smart Education Environment

ABSTRACT. Smart education is a product of the information-based education development, which enhances the intelligence of traditional education and realizes innovative education. According to the personalized learning characteristics of students, with a student-centered approach and relying on Online-to-offline hybrid courses, we have designed a new architecture for recommending educational resources to improve learning effectiveness and promote effective teaching by tapping into students' potential. The use of sequential learning technology for teaching resource recommendation is a popular research direction in intelligent education, and its core is the recommendation algorithm of personalized resources. In order to solve the problem of insufficient location information and low accuracy in the result table based on sorting learning, a recommendation algorithm of interest points based on ListMLE is proposed. Firstly, the ListMLE algorithm is applied to interest point recommendation based on the attention difference of interest point location in the recommendation list. Secondly, the influence of users' social relations is incorporated into the scoring function of ListMLE. Finally, a cost-sensitive method is introduced in the recommendation list calculation process. This paper proposes an online education resource recommendation method for personalized learning. Experimental results show that the algorithm outperforms the baseline ranking learning algorithm in terms of accuracy and recall. The method can be used to study students' learning behaviors and provide a theoretical basis for designing personalized learning programs based on students' learning status.

17:20
Building extraction from hyperspectral remote sensing imagery based on a semantic segmentation model emphasizing useful feature detection of single sample
PRESENTER: Hui He

ABSTRACT. Hyperspectral remote sensing imaging technology provides a lot of help to our daily life, such as urban building information statistics, green vegetation estimation, and so on. However, hyperspectral images have a large amount of spectral bands, and it may require huge human and material resources to extract valuable information from the images by using common image processing methods. Therefore, how to classify and segment the information of hyperspectral images more conveniently and quickly is also a focus of current research. In recent years, deep learning has been a popular choice for segmenting hyperspectral images segmentation, and the models represented by convolutional neural networks have achieved high accuracy and efficiency. However, deep learning often requires the use of very large data sets to train the models, which consumes a lot of computational resources and human cost. And the bad weather condition often results in less available data and thus we cannot rely on a large number of labeled samples for training to make accurate segmentation. To solve the problem of insufficient data, we focus on how to design a deep learning model, which can make accurate segmentation of hyperspectral images even with a small number of training samples. This paper presents an encoder - decoder network combining multi-scale feature fusion and mixed attention mechanism. Experimental results on Huston 2018 hyperspectral dataset with the proposed model, DeepLab V3+, PSPNet, Unet, and Swin-Transformer show that the proposed model obtained more effective and more accurate segmentation results. Especially as the number of training samples decreases, the advantages of this method become more prominent. When only 50% samples of the original training set were taken, the average cross merge ratio (mIOU) reached 91.90%, 4.5% higher than the suboptimal method; In the case of 16-shot and 8-shot, it remained 89.42% and 77.11%, 13.6 and 18 percentage points higher than the sub optimal method respectively. From the visual results, it can also be seen that the results of the proposed method are obviously better than those of the comparison methods. It can be seen that the model proposed in this paper is suitable for accurate building extraction o from hyperspectral images in the case of small data sets.

17:30
Grasping Position Estimation Method Using Depth Image for Thin Objects
PRESENTER: Takuya Yoshihara

ABSTRACT. The robot market in Japan is gradually expanding due to increasing demand. Industrial robots are being actively introduced in the manufacturing industry. Introducing robots has three advantages: securing labor, increasing productivity, and improving quality. Robots can operate for long periods with constant work efficiency, enabling stable production. In addition, by replacing human labor, robots can secure the workforce and reduce human error.  Teaching the grasping position is one of the drawbacks of introducing robots. This requires time and a technician with specialized knowledge. Furthermore, this method cannot perform grasping when the grasping object is not in the specified position. However, the introduction of robot vision may solve these problems.  In this study, a thin object was targeted. By using depth images, processed images, and deep learning models, we aimed for highly accurate grasping position estimation independent of the object color.  Experiments were conducted on image processing of depth images and improvement of the deep learning model. In image processing for depth images, depth images were sharpened mainly by applying gray-scale transformations. In the progress of the depth learning model, we modified the model structure.

17:40
A SQL Injection Attack Recognition Model Based on 1D Convolutional Neural Network
PRESENTER: Jing Jiang

ABSTRACT. The rapid development of Internet technology and the widespread popularization of web applications have brought a lot of convenience to people's lives. At the same time, due to the interaction of more user information between browsers and servers, attacks on web applications have intensified, and network security issues occur frequently. SQL injection attacks have always been a common tactic of cyber attackers due to their simplicity and high threat level. Nowadays, SQL injection attacks are emerging in an endless stream, and detection models based on traditional machine learning algorithms have been unable to identify effectively and accurately complex SQL injection attacks. In this paper, we propose a SQL injection recognition model based on one-dimensional convolutional neural network (1D CNN) and combining word embedding and ASCII transcoding techniques. The model can recognize all sorts of SQL injection attacks more efficiently and accurately, dramatically lower human intervention, and offer some defense against 0day attacks that never occur.