ISAIR2021: THE 6TH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND ROBOTICS
PROGRAM FOR MONDAY, AUGUST 23RD
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-12:00 Session 10: Poster Session2
Customer Data Privacy Protection Method Based On Singular Value Decomposition Clustering Algorithm

ABSTRACT. With the rapid development of information technology, the large-scale personal data including sensors or IoT devices are stored in the cloud or data centers. While in some cases, the owner of the data cloud center needs to publish the data. Therefore, to make full use of data when undergoing the risk of personal information leakage has become a hot research topic. When data is released in multiple times, personal privacy will also be revealed. Therefore, this paper proposes a new method based on singular value decomposition clustering algorithm to complete the clustering process. In this way, data availability and privacy protection can be flexibly adjusted by considering the concepts of distance and information entropy. Second, this article also proposes a dynamic update mechanism that ensures that personal data is not compromised after multiple releases of data, and that information loss is minimized. Experiments show that the average amount of information lost and the amount of forged data prove the availability and advantages of this method.

Robust Tracking Design for Quadrotor Unmanned Aerial V ehicle: A GPI Observer based Approach

ABSTRACT. This paper mainly studies the trajectory tracking control for quadrotor unmanned aerial vehicle (UAV) with unknown time-varying disturbances including model errors, parametric uncertainties and external disturbances such as wind effects. Conventional back stepping control schemes cannot achieve satisfactory performances in handling time-varying disturbances. Improved schemes, such as: integral backstepping, can only compensate the disturbances in a relative slowly way. By introducing disturbance observer technology into the design process. a composite GPI observer-based robust control scheme is developed. Firstly, a generalized proportional integral observer (GPIO) is used to estimate the lumped time-varying disturbances. Secondly, a composite controller combining backstepping control and disturbance estimation method is designed, which is called the BSC+GPIO method. This control method is convenient to implement and the disturbance rejection capability is enhanced by feedforward compensating the estimated value of system disturbances. Simulation results illustrate the good tracking performance and the robustness of the proposed scheme.

Human motion capture data recovery with discrete subspace structure constraint

ABSTRACT. Low-rank matrix completion has a good effect in human motion recovery, but it ignores the subspace structure among human poses in the spatiotemporal dimension. Based on the assumption that human motion data are drawn from multiple subspaces, we propose a novel human motion recovery model with discrete subspace structure (DSS) constraint, which integrates the techniques of subspace clustering and low-rank matrix completion. The model takes into account the multi-subspace structure, low-rank and temporally smooth properties of human motion capture data and it can realize tasks of recovery and classification simultaneously. To better approximate the low-rank property, we utilize Schatten p-norm to surrogate the rank function of a matrix. In our experiments, we choose three representative motion sequences from CMU library, and the recovery results show that our method outperforms classical methods, e.g., TSMC and SCMC, and meanwhile achieves great classification effect.

A Trajectory Tracking Method of Mobile Robot Based on Sliding Mode Control and Disturbance Observer
PRESENTER: Yang Zhang

ABSTRACT. A trajectory tracking method based on sliding mode control (SMC) and disturbance observer is proposed for the wheeled mobile robot (WMR) suffering from unknown disturbances. First of all, a continuous sliding mode control (CSMC) method is designed for an uncertain WMR. However, when the system confronts stronger disturbances from internal or external, the large control needs to be designed, which will lead to a large steady-state tracking error. To improve robustness and reduce the tracking error, two nonlinear disturbance observers (NDOs) are designed to respectively estimate various unpredictable disturbances in the kinematics and dynamics descriptions of the system studied, such as skiddings, slippings, and parameter uncertainties. Then, with the aid of disturbance estimations, a new trajectory tracking method is constructed for the WMR, which consists of the CSMC and NDOB. What’s more, the stability of the entire closed-loop system under the present control strategy is proved in detail. Finally, the tracking performance of the proposed controller is verified by the simulation.

Multi-agent reinforcement learning for prostate localization based on multi-scale image representation

ABSTRACT. The analysis of magnetic resonance (MR) images plays an important role in medicine diagnosis. The localization of the anatomical structure of lesions or organs is a very important pretreatment step in clinical treatment planning. Furthermore, the accuracy of localization directly affects the diagnosis. We propose a multi-agent deep reinforcement learning-based method for prostate localization in MR image. We construct a collaborative communication environment for multi-agent interaction by sharing parameters of convolution layers of all agents. Because each agent needs to make action strategies independently, the fully connected layers are separate for each agent. In addition, we present a coarse-to-fine multi-scale image representation method to further improve the accuracy of prostate localization. The experimental results show that our method outperforms several state-of-the-art methods on PROMISE12 test dataset.

Short-term load forecasting based on improved GEP and abnormal load recognition

ABSTRACT. Short-term power load forecasting is very important to the economic dispatch and safety assessment of the power system. Although existing short-term power load forecasting algorithms have reached the required forecast accuracy, most of the forecasting models are black boxes and cannot be constructed to display mathematical models. At the same time, because of the abnormal load caused by the failure of the load data collection device, time synchronization and malicious tampering, the accuracy of the existing load forecasting model is greatly reduced. In order to address the above problems, this paper proposes a Short-Term Load Forecasting algorithm based on Improved Gene Expression Programming (GEP) and Abnormal Load Recognition (STLF-IGEP_ALR). Firstly, the Recognition algorithm of Abnormal Load based on Probability Distribution and Cross Validation (RAL-PDCV) is proposed. By analyzing the probability distribution of rows and columns in load data, and using the probability distribution of rows and columns for cross-validation, misjudgment of normal load in abnormal load data can be better solved. Secondly, by designing strategies for adaptive generation of population parameters, individual evolution of populations and dynamic adjustment of genetic operation probability, an Improved Gene Expression Programming based on Evolutionary Parameter Optimization (IGEP-EPO) is proposed. Finally, the experimental results on two real load datasets and one open load dataset show that compared with the existing abnormal data detection algorithms, the algorithm proposed in this paper has higher advantages in missing detection rate, false detection rate and precision rate, and outperforms the state-of-the-art short-term load forecasting algorithms in terms of the convergence speed, MAE, MAPE, RSME and R2.

Spontaneous facial expression database of learners’ academic emotions in online learning with hand occlusion

ABSTRACT. Academic emotions refer to various emotional experiences in connection with learners’ academic activities in the learning process, which are vital to the development of learners’ physiology and mentality. Facial expression recognition (FER) technology has been widely used in online learning to recognize learners’ academic emotions. However, learners often inadvertently cover part of their face with their hands in online learning, which affects the recognition accuracy on academic emotions. Most existing databases lack facial expression data with hand occlusion, which makes it difficult for researchers to further improve recognition accuracy. Therefore, this research established an online learners’ facial expression database with hand occlusion (OLFED-HO) to solve the above problem. This database has a total of 92947 facial expression images of online learners, including four different hand occlusion situations (no occlusion, left occlusion, middle occlusion, and right occlusion) and seven academic emotions (confusion, curiosity, distraction, enjoyment, fatigue, depression, and neutral). Then, in order to prove the high reliability of our database established in this study, we analyzed the confusion matrix and concluded that the expression labels marked by different external coders have a high internal consistency. Such a database is expected to further promote the application of expression recognition technology in the field of education, and fill the gap of the academic emotion database with hand occlusion. In addition, an automatic facial expression recognition method with transfer learning based on region attention networks (RAN) is proposed in this paper, which efficiently reduces the impact of hand occlusion. And the proposed architecture achieved an accuracy of 89% on the test set of our database.

A hashtag-based sub-event detection framework for social media

ABSTRACT. Event detection is an important research topic in Web mining and knowledge management. Sub-events detection from a large amount of noisy event-related information in social media helps to further analysis and is considered an effective method for event analysis Detecting sub-events from a large number of noisy media platforms event-related information helps to further analyze them and is considered an effective method for event analysis. Due to the noisy and sparse characteristics of online social media, existing sub-event detection methods result in low performance due to the insufficient utilization of semantic information. To overcome the shortcomings of traditional methods, the main contributions of this paper are: (1) a Text-CNN based model to capture the semantic information of Hashtag, (2) an attention mechanism to enhance sub-event representation, (3) a two-step training with KL divergence to enhance the detection capabilities. Experiments show that our model outperforms state of art methods in most cases regarding Normalized Mutual Information (NMI), BCubed, Purity, and Adjusted Rand index (ARI). For Chinese social media event detection, our model averagely increased the NMI by 7.5%, Purity by 4.6%, and ARI by 7.4% over the baseline model.

Variational Local Gradient Threshold Driven Convex Optimization for Single Image Reflection Suppression

ABSTRACT. In order to better suppress the reflection layer image by shooting through the glass, this paper proposes a single-image reflection suppression model to improve the image quality. We combine the local linear model of the guided filter with the gradient threshold to enhance the boundary contour of the image to achieve the effect of suppressing reflections, and effectively solve the established partial differential equations by using discrete cosine transform. Experiments on images taken in different scenes prove the superiority of this method in the problem of single-image reflection suppression.

Deep learning method for makeup style transfer: A survey

ABSTRACT. Makeup transfer is one of the applications of image style transfer, which refers to transfer the reference makeup to the face without makeup, and maintaining the original appearance of the plain face and the makeup style of the reference face. In order to understand the research status of makeup transfer, this paper systematically sorts out makeup transfer technology. According to the development process of the method of makeup transfer, our paper first introduces and analyzes the traditional methods of makeup transfer, In particular, the methods of makeup transfer based on deep learning framework are summarized, covering both disadvantages and advantages. Finally, some key points in the current challenges and future development direction of makeup transfer technology are discussed.

Image Super-Resolution Reconstruction Using Mixed Deep Convolutional Networks

ABSTRACT. Aiming at the problems of blurred image reconstruction, large noise, and poor visual perception in the traditional image super-resolution reconstruction methods, an improved method of image super-resolution reconstruction based on mixed deep convolutional networks had been proposed in the paper. Firstly, the proposed method shrinks the low-resolution image to the specified size in the upsampling stage. Secondly, it can extract the initial features of the low-resolution image in the feature extraction phase. It sends the extracted initial features into the convolutional coding and decoding structure for image features. Thirdly, high-dimensional feature extraction and calculation are performed using dilated convolution in the reconstruction layer, and a high-resolution image has reconstructed. Finally, it can use residual learning to quickly improve the network, while reducing the noise, make the reconstructed image's sharpness and visual effects better. The proposed method is compared with super-resolution reconstruction methods such as Bicubic Interpolation, A+, SelfEX, SRCNN, and VDSR on Set5, Set14, BSD100, and Urban100 datasets. The experimental results can show that the Peak Signal-to-Noise Ratio (PSNR) is increased by some ranges, and the Structural Similarity (SSIM) is increased by some effective percentage points.

Power consumption behavior analysis based on cluster analysis

ABSTRACT. With the construction of smart grid, a large number of user-side power data has been accumulated. This paper proposes a method for analyzing the user’s power behavior based on clustering algorithm. Firstly, the user load data is classified according to the season, and the user’s seasonal power characteristics are analyzed according to the typical daily load curve of the season. Then the average temperature plus load data is used as the feature, and K-means clustering algorithm is used to explore the influence of temperature and holidays on users’ electricity behavior in summer and winter respectively. This paper proposes a method of classifying and analyzing different power consumption modes of a single user, which provides data support for the subsequent load prediction model training for similar days, as well as the formulation of fine management and demand side management decisions for the power grid.

Research on Anti-Occlusion Object Tracking Algorithm Based on URTSS and Improved KCF

ABSTRACT. The object tracking algorithm based on Kernel Correlation Filter has strong dependence on the spatial structure of the object, and it can’t effectively deal with the interference factors such as occlusion and deformation. At the same time, this Histogram of Oriented Gradient feature can’t accurately express the object information in complex object tracking scenes. Aiming at the existing problems that object tracking algorithm fails to track under the influence of occlusion conditions, the paper had improved Kernel Correlation Filter algorithm. Firstly, the occlusion condition had been added to the Kernel Correlation Filter algorithm. If there is no occlusion, the Kernel Correlation Filter algorithm had used for object tracking. If there is occlusion, the improved algorithm based on Unscented Rauch--Tung--Striebel Smoother had been used. Secondly, the predicted position of object had been feedback to the Kernel Correlation Filter algorithm. Finally, the combination of adaptive multi-model had been realized by combining the color histogram with the Kernel Correlation Filter algorithm, and the sparse representation method had been introduced into the training process to enhance the robustness of proposed tracking algorithm. The experimental results on the OTB-2013 dataset with video sequences can show that the proposed tracking algorithm can effectively reduce the occlusion interference in the object tracking process, and improve the accuracy and success rate of object tracking compared with the Kernel Correlation Filter algorithm.

Mitosis detection techniques in H&E stained breast cancer pathological images: A comprehensive review

ABSTRACT. Quantifying mitosis in pathological sections is of great significance in the pathological diagnosis of breast cancer as it is used to evaluate the spread and aggressiveness of the tumor and to provide more comprehensive and reliable information for accurate diagnosis and treatment. Mitosis is one of the important indicators for the classification and diagnosis of breast cancer. Automatic mitotic detection of breast cancer uses computer technology to automatically classify and label mitotic cells and nonmitotic cells in breast pathological images stained by hematoxylin-eosin (H&E), which plays an important role in assisting pathologists in diagnosis and treatment. In this paper, mainstream mitosis detection methods are summarized and classified into four categories according to the types of method: traditional methods, methods based on deep learning, methods combining traditional methods with methods based on deep learning, and other methods. In this paper, each method is introduced, and the performance indicators achieved by some of these methods and their results are discussed, compared, and evaluated. At the same time, some solutions to the problem of the imbalance of positive and negative samples in the mitotic dataset are summarized. Through the review of research methods in this field, the existing research methods used for mitosis in breast cancer are summarized. Finally, prospects for the future development of this field are discussed.

FPLD: An Automatic 3D Craniofacial Feature Points Location Based on Depth Map

ABSTRACT. 3D data registration is a basic problem in computer vision, and the calibration of feature points is an important part of it. The traditional manual calibration method is time-consuming and inaccurate. In this paper, we propose an automatic Feature Points Location method based on Depth map (FPLD) in order to solve the problem of difficulty of feature definition and extraction on 3D craniofacial data with non-rigid deformation and morphological difference. Firstly, the 3D craniofacial data is transformed into depth map, and then the feature points detection network is used to estimates the coordinates of facial and skull feature points. And then these coordinates of the 2D depth image coordinates are converted back to 3D coordinates, so as to obtain the accurate position of 3D skull and skin feature points. The extensive experiments have been carried out on 3D facial and skull data sets, which verified the effectiveness of the proposed method. This method solves the problem of feature point detection on complex topological surface and improves the accuracy of feature point detection on 3D craniofacial data which lays a foundation for improving registration accuracy. It can be widely used in computer graphics and computer vision research such as 3D face recognition, texture matching, shape retrieval and matching, 3D statistical shape analysis, and etc.

Non-Convex Penalty Based Multimodal Medical Image Fusion via Sparse Tensor Factorization

ABSTRACT. Nowadays, medical image fusion serves as a significant aid for the precise diagnosis or surgical navigation. In this paper, we propose a novel tensor factorization based fusion strategy which well combines the multimodal, multiscale nature of medical images and multiway structure of tensors. Since our model adopts the sparse representation (SR) prior, we suffer from the systematic underestimation of the true solution because of the L_1-norm regularization term. To address this problem, we introduce the generalized minimax-concave (GMC) penalty into our framework, which is a non-convex regularization term itself. It is beneficial for the whole cost function to maintain convexity. Furthermore, we combine the alternating direction method of multipliers (ADMM) algorithm and forward-backward (FB) method to achieve the optimization process. We conduct extensive experiments on five kinds of practical medical image fusion problems with 96 pairs of images in total. The results confirm that our model has great improvements in visual performance and objective metrics against the existing methods.

LAEDNet: A Lightweight Attention Encoder-Decoder Network for Ultrasound Medical Image Segmentation
PRESENTER: Yunchao Bao

ABSTRACT. Automatic ultrasound image segmentation plays an important role in early diagnosis of human diseases. This paper introduces a novel and efficient encoder-decoder network, called Lightweight Attention Encoder-Decoder Network (LAEDNet), for automatic ultrasound image segmentation. In contrast to previous encoder-decoder networks that involve complicated architecture with numerous parameters, our LAEDNet adopts lightweight version of EfficientNet as encoder. On the other hand, a Lightweight Residual Squeeze-and-Excitation (LRSE) block is employed in decoder. To achieve trade-off between segmentation accuracy and implementing efficiency, we also present a family of models, from light to heavy (denoted as LAEDNet-S, LAEDNet-M, and LAEDNet-L, respectively), with varying lightweight version of EfficientNet backbones. To evaluate LAEDNet, we have conducted extensive experiments on Brachial Plexus Dataset (BP) and Breast Ultrasound Images Dataset (BUSI), where ultrasound images are suffered from high noise, blurred borders and low contrast. The experiments show that, compared with U-Net and its variants, e.g., M-Net and U-Net++, our LAEDNet achieves better results in terms of Dice Coefficient (DSC) and running speed. Particularly, LAEDNet-M only has 10.75M model parameters with 40.7 FPS, yet obtaining 73.6% and 73.8% DSC on BP and BUSI datasets, respectively.

A multi-scale auxiliary feature fusion small object detection algorithm based on improved Faster RCNN

ABSTRACT. Faster RCNN, as a classical detection algorithm, is effective in detection accuracy. However, since Faster RCNN loses the shallow features containing spatial information during feature extraction, leads to his poor detection of small objects. Therefore, we introduce a multi-scale auxiliary feature fusion Faster RCNN small object detection algorithm. This algorithm consists of two modules: multi-scale auxiliary feature network, feature fusion module. Firstly, we introduce shallow features extracted by a multi-scale auxiliary feature network into the backbone network, as a way to ensure that there is sufficient spatial information for detecting small objects even for the deepest features. Secondly, we fuse the auxiliary feature and backbone feature via the fusion module. Finally, to make the object proposal boxes positioning more precise in the ROI classification and regression network, replace RoIPool with RoIAlign. Our experiments are conducted on PASCAL VOC and KITTI autopilot datasets. Compared with some typical detection approaches, experimental results show that our improved Faster RCNN algorithm increases visibly the detection performance of small objects.

A brain tumor segmentation method based on improved 3D U-Net

ABSTRACT. The brain tumor is a disease that seriously threatens the life and health of patients. Automatic and accurate segmentation of brain tumors from MRI images is of vital importance to the diagnosis and treatment of patients. This paper proposes a convolutional network based on improved 3D U-Net for brain tumor segmentation tasks. The network as a whole has an Encoder-Decoder structure. An optimized residual block is added in the encoding stage of the model, and a deep supervision mechanism is introduced in the decoding stage. Then, a hybrid loss function is used to improve the training speed and segmentation performance. The model was trained and tested on the brain tumor dataset BraTS2018. The average Dice scores in the whole tumor area, tumor core area, and enhanced tumor area are 90.02%, 84.18%, and 79.61%, respectively. The proposed model in this paper has high segmentation performance, which can provide an effective reference for the diagnosis of brain tumor patients.

Super-Resolution Reconstruction of Remote Sensing Image Based on Capsule Confrontation Network

ABSTRACT. The current Generative Adversarial Network (GAN) series of super-resolution reconstruction methods are mainly built on the basis of Convolutional Neural Networks (CNN). These methods have a good performance in high-frequency details and visual effects, but CNN lacks the necessary attention to local spatial information connection, which leads to some local detail problems in a series of super-resolution methods based on GAN, Such as excessive brightness and unnatural pixels in the image. How to generate better image local details through the connection of image spatial information is a key problem that needs to be solved. Therefore, the author proposed the super-resolution reconstruction network CapSRGAN using capsule network as discriminator, and proposed the generator loss function combined a vector inner product loss function and an adversarial loss function to train CapSRGAN. In the experiment, PSNR and SSIM tests were used as the evaluation of the image restoration effect. The final results showed that the remote sensing super-resolution image generated by CapSRGAN was closer to the original image.

Improved threshold recognition of the coal and the gangue by using X-ray image

ABSTRACT. The traditional coal preparation methods include the jigging coal preparation, the dry coal preparation, and theγ- ray coal preparation. Although these methods achieve the function of the coal preparation, they have some problems such as the low accuracy, the high cost, the long time-consuming, and the great health hazard. Aiming at these problems, a improved threshold recognition method is developed by using the X-ray image. First, the images of the coal and the gangue is obtained by using X-ray scanner, and then the gray values is obtained. Second, the thickness of the coal and the gangue is calculated. Third, the gray value and the thickness information of the coal and the gangue are combined, and the separation threshold is determined. Finally, the recognition of the coal and the gangue is realized. The experimental results show that the recognition accuracy can reach about 98%.

Joint Appearance Enhancement and Temporal Difference Representations for Video-based Person Re-Identification

ABSTRACT. Video-based person re-identification aims at matching the same person across video clips. The key to tackling the challenging task is to model both appearance and temporal clues in the video. We find that the generic appearance features within the video always are the reliable one of the target pedestrian, and the differences between the adjacent frames are also a kind of temporal information. Thus, we propose a novel Appearance Enhancement and Temporal Differences Representation (AETDR) framework to capture the better characteristic of appearance and temporal about the pedestrian in this paper. The framework mainly contains two independent branches: Appearance Boosting Module (ABM) and Frame Differences Representation Module (FDRM). Specifically, the Appearance Boosting Module employs a non-parametric way to measure the importance of the image features. Then the robust appearance features about the target pedestrian can be obtained by aggregating the image features based on the importance scores. While the Frame Difference Representation Module focuses on the differences feature between the adjacent frames to extract more discriminative temporal features. Finally, we fuse the temporal and appearance features to yield a video representation. As shown in the experiments, our model is competitive relative to other state-of-the-art methods on MARS and DukeMTMC-VID benchmarks.

Context-aware Network for Pulmonary Nodule detection in CT Images

ABSTRACT. Suffered from low resolution images and straightforward features, existing R-CNN based methods for pulmonary nodule detection usually fail in detecting objects with small scales. In this paper, we propose a novel context-aware network which takes the pulmonary regions and their neighbors for joint learning. The contextual cues of these regions reinforce each other, which is beneficial for the detection of small regions. Moreover, a set of redesigned anchors are used to adapted pulmonary nodules with various sizes. In order to avoid dilution by redundant samples specifying large nodules, a data enhancement strategy is implemented in the training stage by identifying hard samples. We test the proposed network on a dataset with 2000 lung images and demonstrate it performs well in detection of lung nodules with various sizes.

A Hybrid-Attention semantic segmentation network for remote sensing interpretation in land-use surveillance

ABSTRACT. Remote sensing interpretation in land-use surveillance needs to mark out the specific area on unmanned aerial vehicle or satellite images. According to the surveillance rules these marked areas usually show the surface feature and texture of target with obvious difference with surroundings in images. Semantic segmentation often used for automatic interpretation in image regions, which can accommodate both the target and the neighbor achieves through large receptive field. Because the artificial disturbance areas have various types and sizes on remote sensing image, and some categories have not appeared in common data sets, the problem of small target loss occurs when the semantic segmentation is applied. This paper proposes a semantic segmentation method that has better segmentation accuracy for multi-scale targets and rare categories. Benefit by the spatial attention mechanism, the current position can obtain the global correlation. Furthermore,the channel attention mechanism is used to assign higher weights to task-related channels after concatenating feature maps. Small-scale objects can be recognized and feature information can be better used. The method in this paper conducts experiments on an open remote sensing dataset and has competitive segmentation accuracy which shown these attention mechanisms have significant enhancement effects compared to the state of the art networks. In practical application, a dataset of the violating construction specification was used and achieved good performance.

Attention meta-transfer learning for iris recognition

ABSTRACT. Iris recognition is a hot research field in biometrics, and it plays an important role in automatic recognition systems. When a sufficient amount of labeled data is available, some iris recognition algorithms combined with deep learning have achieved excellent performance. In the case of a limited number of samples, over-fitting often occurs if deep learning methods are directly used for training, which will affect the recognition effect. The learning problem with insufficient sample size can be solved by using few-shot learning methods. In this article, we use the meta-transfer learning(MTL) method to solve the problem of small sample iris recognition, propose an attention meta-transfer learning(Attention MTL) approach for iris recognition via an improved attention network model designed by us. Experiments on the benchmark datasets show that our Attention MTL has a further improvement in recognition accuracy, which is comparable to the advanced technology in the field of iris recognition.

Passive Video Surveillance System

ABSTRACT. The passive video surveillance system is designed in this paper. At first, it captures images through the camera sensor and displays the image in the client program without battery and complicated wiring. This system consists of three modules: energy harvesting, data transmission, and data processing. Firstly, the system transmits UHF radio frequency signals to the environment through a signal transmitter, then the system converts the received radio frequency signal into an electrical signal and stores it in a on-board supercapacitor to supply power to each device in the system. The system dispatches the image capture task through a low-power microprocessor, and transmits the data to the host under the same local network through a Wi-Fi module.

Tag generation method based on topic information

ABSTRACT. The traditional tag generation method of text resources is only based on the information of the text itself. However, it ignores words with low frequency but high topic relevance, resulting in low accuracy of tag generation. So, this paper bases on the traditional TextRank model, using the document-topic distribution and the distribution of words under the corresponding topics to measure the importance of words in the document, to adjust the random jump probability of nodes. Then, the similarity between word vectors and statistical feature information are used to update the weight of word nodes iteratively. As a result, a new word graph model is constructed to generate text tags. Compared with the traditional TF-IDF, TextRank and other related algorithms, the experimental results of our model on real datasets demonstrate the effectiveness of our proposed method, which has a certain improvement in accuracy, recall and F value.

Essential Multi-view Graph Learning for Clustering

ABSTRACT. Multi-view clustering utilizes information from diverse views to improve the performance of clustering. For most existing multi-view spectral clustering methods, information of different views is integrated by pursuing a consensus similarity matrix for clustering. However, view-specific structures, which contain the complementary information of multi-view data, may be lost during the clustering process. Actually, in multi-view spectral clustering, similarity matrices of multiple views would have the same clustering structures or properties rather than be numerically uniform. To overcome the aforementioned problem, a novel Essential Multi-view Graph Learning (EMGL) method for clustering is proposed in this paper. Different from most existing multi-view spectral clustering, an orthogonal matrix factorization is imposed on multi-view similarity matrices for making them have the same nuclear norm, which indicates the same clustering structures of different views. Furthermore, we also propose an Augmented Lagrangian Multiplier (ALM) based optimization algorithm to address the objective function of our method. Experiments on several datasets demonstrate the superior performance of our proposed method.

Recurrent Attention Hashing for Medial Cross-Modal Retrieval

ABSTRACT. Medical cross-modal retrieval aims to search the semantically similar medical instances across different modalities, such as searching X-ray images using radiology reports or searching radiology reports using X-ray images. The main challenge for medical cross-modal retrieval is the semantic gap, and the small visual differences between different categories of medical images. To address those issues, we present a novel end-to-end deep hashing method, called Recurrent Attention Hashing for Medical Cross-Modal Retrieval (RAH-MCMR), which extracts the global features utilizing global average pooling and local features by recurrent attention. Specifically, we recursively from the coarse region to the fine-grained region of images to locate the discriminative region more accurately, and recursively from sentence level to the word level to extract the discriminative semantic information for text. Then, we select the discriminative features by aggregating the finer feature via adaptive attention. Finally, to reduce semantics gap, we map images and reports features into a common space and obtained the discriminative hash code. Comprehensive experimental results on large-scale medical dataset MIMIC-CXR and natural scene dataset MS-COCO show that RAH-MCMR can achieve better performance than existing cross-modal hashing methods

Semantic Constraints Matrix Factorization Hashing for Cross-modal Retrieval

ABSTRACT. Cross-modal hashing methods have attracted considerable attention due to their low memory usage and high query speed in large-scale multi-modal retrieval. During the encoding process, there still remains two crucial bottlenecks: how to equip hash codes with comprehensive information, and how to rapidly obtain hash codes.In this paper, we propose Semantic Constraints Matrix Factorization Hashing (SCMFH) which simultaneously preserve intra-modal, inter-modal and category-level information with O(N) running time. Specifically, we factorize the original feature representations into individual latent semantic representations, and then use the semantic similarity matrices to constrain the correlation of individual semantic representation. Finally, we regress the class labels to binary codes, and obtain hash codes directly by an efficient optimization with a closed solution. Extensive experimental results on three public datasets demonstrate that the proposed method outperforms many existing cross-modal hashing methods.

Hierarchical K-means clustering for registration of multi-view point sets

ABSTRACT. As a long-standing research issue in computer vision and robotics, multi-view registration attracts much attention in recent years. Most existing works are mainly focus on the estimating the point to point match correspondence, which usually suffers from the poor initial pose as well as data noise and therefore leads to the inaccurate matches. To overcome the aforementioned limitation, we propose a novel Hierarchical K-means Clustering Registration (HKCR), which casts the multi-view registration as a hierarchical clustering task. Specifically, the proposed method employs a small number of clusters firstly, and then increases the number of clusters during the registration process. Benefiting from the recursive partitioning process, more robust and more accurate results can be achieved with the increasing finer granularity. To show the effectiveness and robustness of HKCR, extensive experiments are conducted on several benchmark datasets and compared to several state-of-the-art methods.

Attention Mutual Teaching Network for Unsupervised Domain Adaptation Person Re-identification

ABSTRACT. Person re-identification (ReID) is an important task in computer vision. Most methods based on supervised strategies have achieved high performance. However, performance cannot be maintained when these methods are applied without labels because styles in different scenes exhibit considerable discrepancy. To address this problem, we propose an attention mutual teaching (AMT) network for unsupervised domain adaptation person ReID. The AMT method improves the performance of a model through iterative clustering and retraining. Meanwhile, two attention modules can teach each other to reduce clustering noise. We conduct extensive experiments on the Market-1501 and DukeMTMC-reID datasets. The experiments show that our approach performs better than state-of-the-art unsupervised methods.

Binocular Ranging Based on Convolutional Neural Network

ABSTRACT. Binocular vision is widely used in robot navigation, precision industrial measurement, object recognition, virtual reality, scene reconstruction and other fields. In this work, we study two common computer vision tasks: binocular vision ranging and semantic segmentation. Our ranging method can use deep learning model to balance the amount of calculation and accuracy in reasoning, and proposes a network disparity prediction method combining edge information. By adding the edge information of the input image at the input end, the disparity prediction has better accuracy at the boundary, micro details and so on. This is because, even if trained in a limited ground truth (200 KITTI images), it can provide a more accurate boundary than any image-based depth estimation (monocular or stereo). The experimental results show that the proposed method is feasible and effective.

13:00-18:00 Session 11: Keynote Speech (Prof. Shenglin Mu, Prof. Joze Guna)

Prof. Joze Guna's PPT https://we.tl/t-YaiNg3Kcd7

Chair:
13:00
An Eye-Gaze Input Interface using Deep Learning
14:00
Advances and Challenges in Combating VR Sickness Effects