ISAIR2021: THE 6TH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND ROBOTICS
PROGRAM FOR SUNDAY, AUGUST 22ND
Days:
previous day
next day
all days

View: session overviewtalk overview

08:00-09:00 Session 5: Keynote Speech (Prof. Manu Malek)
Chair:
08:00
Internet of Things: Applications, Enablers, Security
09:00-10:00 Session 6: Award Ceremony & Best Papers Presentation
Chair:
09:00
Deep Capsule Network for Recognition and Separation of Fully Overlapping Handwritten Digits

ABSTRACT. The recognition and separation of fully overlapping handwritten digits are an effective means to detect the recognition ability of the network,it is also the basis of separating overlapping complex handwritten characters, such as handwritten English and handwritten Chinese. At present,most convolutional neural networks have been unable to achieve good results on this problem. Based on the concept of the "capsule", this paper constructs a deep capsule network FOD_DCNet for the recognition and separation of fully overlapping handwritten digits, which consists of a "recognition network" and a "reconstruction network". For the "recognition network", firstly, it uses a smaller size convolution kernel to extract features, which is conducive to extracting fine-grained features and greatly reducing training parameter;secondly, it uses higher-dimensional capsules to express the extracted features to avoid the loss and omission of features; thirdly, it uses series dual dynamic routing collocation to optimize the routing classification function, using the front routing for rough classification, and the latter for fine classification. In this way, the number of iterations of each route can be reduced, and the classification efficiency can be higher without increasing the total workload. Finally, a "reconstruction network" is built to receive the classification results of the "recognition network" to separate them into two separate pictures. Experimental results show that, compared with CapsNet, FOD_DCNet has fewer training parameters, shorter training time, and a higher ability to recognize and separate overlapping digits than CapsNet.

09:20
Multi-level Relevance Extraction and Aggregation for Chinese Semantic Matching

ABSTRACT. Semantic matching plays a critical role in a variety of tasks in natural language processing. Although deep learning has been applied successfully on semantic matching task, limited work has been done on Chinese texts. Most of the existing methods focus on English texts and pay attention to encoding sentences on word level and modeling semantic interactions on sentence level. However, Chinese is more sophisticated than English; it can be segmented on character and word granularity, each of which contains different semantic information. How to model and capture the complex matching features contained in different granularities, i.e., character, word and sentence levels, in Chinese texts is very challenging and yet rarely explored. In this paper, we propose Multi-level Relevance Extraction and Aggregation, MREA, for Chinese sentence semantic matching task. In MREA, we first devise multiple attention modules to extract the semantic relevance features on character, word and sentence levels simultaneously, then implement a feature aggregation module to aggregate abundant relevance features into the final semantic matching representation which is used to evaluate the matching degree of Chinese sentences. Extensive experiments on two real-world datasets demonstrate that MREA outperforms the compared neural semantic matching methods, and achieves comparable performance with BERT-based methods.

09:40
Scaled Gated Networks

ABSTRACT. Gating transformation demonstrates great potential in recent deep convolutional neural networks design, which is capable of enriching the feature representation and alleviating noisy signals by modeling the inter-channel dependencies using learnable parameters. Neural network search based model scaling methods reduce model redundancy according to computational budgets limitation, leading to remarkable balance between model complexity and performance. However, the scaling based approaches to reduce the redundancy of hand-crafted attention mechanism have rarely been investigated. This paper proposes a novel scaled gated convolution that enables attention-enhanced CNNs to overcome the paradox between performance and redundancy. Our scaled gated convolution is a simple and effective hand-crafted alternative compared with both vanilla convolution and attention-enhanced convolutions, which can be easily applied to modern CNNs in a plug-and-play manner without introducing extra complexity or tuning network architectures. Exhaustive experiments demonstrate that stacking our scaled gated convolution in baselines can significantly improve the performance in a broad range of visual recognition tasks, including image recognition, object detection, instance segmentation, keypoint detection and panoptic segmentation while obtaining a better trade-off between performance and attentive redundancy compared with other counterparts.

10:00-12:00 Session 7: Keynote Speech (Prof. Yasushi Makihara, Prof. Weihua Ou)
10:00
Video-based Gait Analysis and Its Applications
11:00
Cross Modality Retrieval and Applications
13:00-15:00 Session 8: Oral Session3
13:00
Study on Eye-Interface System with Image Information from Two Cameras Using Deep Learning Method

ABSTRACT. Lacking of effective communication methods are one of the most severe issues that the people with physical disabilities, such as ALS (amyotrophic lateral sclerosis) facing in their social lives. In this research, an eye-interface system with an image processing method based on deep learning using image information from two cameras is proposed. The proposed system is with inexpensive and easy to install features under natural light environment without the risk of illness due to the use of infrared radiation. In the proposed system, the deep learning method of convolutional neural networks (CNN) are applied to improve the accuracy for practical use. The CNN in this study estimates the gaze position on the monitor screen from the image acquired from the cameras, and aims to obtain higher accuracy than the conventional system by learning for specific individuals.

13:10
Proposal of Automatic Trash System of Pet Sheet for Driving
PRESENTER: Airi Taniguchi

ABSTRACT. Many animals, including pets and assistance dogs for the physically disabled, are all around us. However, there are also dangers to contact many zoonotic diseases. One of the routes of infection is though excrement, so we have developed a device to automatically fold pet sheets. In addition, we have developed a system that can remotely control this device so that it can be used in a car. The system consists of seven servo motors, a DC motor, and a pair of communication devices. When a button on the communication device is pressed while driving, the system automatically folds the pet sheets along the creases and throws them into the trash. This allows pet owners to dispose of their waste without touching the pet sheet, thus preventing zoonotic diseases. Experiments conducted on the disposal of pet sheets with urine and feces on them showed a 100% success rate.

13:20
Comparison of grape flower counting using patch-based instance segmentation and density-based estimation with Convolutional Neural Networks

ABSTRACT. Information on flower number per grapevine inflorescence is critical for grapevine genetic improvement, early yield estimation and vineyard management. Previous approaches to automize this process by traditional image processing techniques such as color and morphology analysis, have failed in the improvement of a universal system that can be applied to multiple grapevine cultivars during different growth stages under various illumination conditions. Deep neural networks present numerous opportunities for image-based plant phenotyping. In this study, we evaluated three deep learning-based approaches for automatic counting of flower numbers on inflorescence images, built on instance segmentation using Mask R-CNN, object density-map estimation using U-Net and patch-based instance segmentation using Mask R-CNN, respectively. The results were analyzed on a publicly available grapevine inflorescence dataset of 204 images of four different cultivars during various growth stages, providing a high diversity for inflorescence morphology. The algorithm, based on patch-based instance segmentation using Mask R-CNN, produced counting results highly correlated to manual counts (R2 = 0.96). Practically constant MAPE values among different cultivars (from 5.50% to 8.45%), implying a high robustness in this method. Achieving the fastest counting (0.33 sec. per image of size 512 × 512) with slightly lower counting accuracy (R2 = 0.91), the method based on object density-map turned out to be suitable for real-time flower counting systems.

13:30
Application of transformers for predicting epilepsy treatment response

ABSTRACT. There is growing interest in machine learning based approaches to assist clinicians in treatment selection. In the treatment of epilepsy, a common neurological disorder that affects 70 million people worldwide, previous research has employed scoring methods generated from traditional machine learning methods based on pre-treatment patient characteristics to classify those with drug-resistant epilepsy (DRE). In this study, we used an attention-based approach in predicting the response to different antiseizure medications (ASMs) in individuals with newly diagnosed epilepsy. By applying a conventional transformer to model the patient’s response, we can use the predicted probability to determine the success rate of specific ASMs. Applying the transformer allowed the model to place attention on patient information and past treatments to model future drug responses. We trained a conventional transformer model based on one cohort of 1536 patients with newly diagnosed epilepsy, compared its performance with other trained models using RNN and LSTM, and applied it to a validation cohort of 736 patients. In the development cohort, the transformer model showed the highest accuracy (81%) and AUC (0.85), and maintained similar accuracy and AUC (74% and 0.79, respectively) in the validation cohort.

13:40
Path Planning for Unmanned Aerial Vehicles in Constrained Environments for Locust Elimination

ABSTRACT. Present-day agricultural practices such as blanket spraying not only leads to excessive usage of pesticides, but also harms the overall crop yield. This paper introduces an algorithm to optimize the traversal of an Unmanned Aerial Vehicle (UAV) in constrained environments. The proposed system focuses on the agricultural application of targeted spraying for locust elimination. Given a satellite image of a farm, target zones that are prone to locust swarm formation are detected through calculation of the Normalized Difference Vegetation Index (NDVI). This is followed by determining the optimal path for traversal of a UAV through these target zones using the proposed algorithm in order to perform pesticide spraying in the most efficient manner possible. Unlike the classic travelling salesman problem involving point-to-point optimization, the proposed algorithm determines an optimal path for multiple regions, independent of its geometry. The savings obtained by employing the proposed method is directly proportional to the total non-infested area in an agricultural land compared to the conventional method. Finally, the paper explores the idea of implementing reinforcement learning to model complex environmental behaviour and make the path planning mechanism for UAVs agnostic to external environment changes. This system not only presents a solution to the enormous losses incurred due to locust attacks, but also an efficient way to automate agricultural practices across the globe in order to improve farmer ergonomics.

13:50
Automatic Annotation Method for Food Classification in Lunchbox

ABSTRACT. In this paper, we propose an automatic annotation method extracting food region from images collected on the Web and a food classification method using the fine-tuned VGG16 model. The annotation-free segmentation method can automatically create a large number of training images with pixel-by-pixel annotation. The food classification method can segment food region with low computational cost. In the experiment, the food classification was performed on lunchbox images. To evaluate the performance of the proposed method, the recall and precision were calculated. As a result of the recall and precision, it was confirmed that the proposed method can obtain the necessary information for meal assistance robots.

14:00
Dynamic adaptive residual network for liver CT image segmentation

ABSTRACT. Due to the gray values of liver and surrounding tissues and organs are resemblance in abdominal CT images, it is difficult to accurately determine the boundary information of liver images. To solve the aforementioned issues, we propose a segmentation method based on dynamic adaptive residual network (DAR-net). Firstly, inspired by the U-net network, a dynamic adaptive pooling strategy based on interpolation optimization is employed to process all features in the pooling domain. Additionally, it also integrates the shallow and deep features by skip connections to adequately learn the features of different layers. Next, a residual network composed of batch processing and PRelu is introduced to improve the generalization ability and convergence speed of DAR-net, and solve the problem of over and under segmentation. Furthermore, a conditional random field method is employed to optimize the boundaries and textures of liver to avoid the inaccuracy of liver boundaries. Finally, the 3D reconstruction of the liver is performed on the segmentation results via the visualization toolkit (VTK) and insight toolkit (ITK). Experiments on a standard 3DIRCADB dataset demonstrate that DAR-net produces the average Dice score of 96.13%, which increases by 13.02% compared to the prediction result without any processing. DAR-net can effectively improve the accuracy of liver segmentation, which outperform five state-of-the-art methods.

14:10
Center Heatmap Attention For Few-Shot Object Detection

ABSTRACT. With the development of computer vision and deep learning, the convolutional neural network has been widely used in image processing such as object detection and semantic segmentation, and has achieved breakthrough achievements. However, when the training samples are insufficient, the conventional neural network usually has unsatisfactory robustness. In order to solve the problem, we improve the generalization performance of the few-shot detectors by focusing on the target center and can identify novel categories. The paper proposes a new attention mechanism based on the auxiliary circle feature map of the object center. By selecting an auxiliary circle feature map with the object center as the center of the circle and the minimum size in height and width as the diameter, adding it to the anchor-free CenterNet network as soft attention to promote network training. Several experiments on PASCAL VOC2007/2012 datasets show that the proposed method achieves the most advanced level in terms of the accuracy and standard deviation of few-shot object detection, which indicates the algorithm’s effectiveness.

14:20
Global Attention Network for Co-saliency Detection

ABSTRACT. Co-saliency detection aims to identify common and saliency objects or regions in a group of related images. The major challenge is how to extract useful features from single images and image groups to express collaborative saliency cues. This paper proposes a global attention network for co-saliency detection that integrates the salient features of individual images and collaborative correlation features. Firstly, we feed the input images into feature enhancement module to produce preprocessed features. then, to increase globe context information, we perform global information operations on the preprocessed features and embed non-local modules in the backbone network. Finally, we build a collaborative correlation module to extract collaborative and consistent information. Specifically, we first extract the deep semantic features of each image, and use channel attention and spatial attention to enhance the network learning of channel and spatial dimensions, and use non-local module to complement the local features extracted by the convolution operation. The collaborative semantics feature was obtained by calculating the correlation between the individual features of the input image in our collaborative correlation module. We evaluate our method on two co-saliency detection benchmark detasets(CoSal2015, iCoseg), our method obtains significant improvements over the state-of-the-arts on most of methods.

14:30
An Artificial Bee Colony Algorithm with an Improved Updating Strategy

ABSTRACT. The artificial bee colony (ABC) algorithm shows a relatively powerful exploration search capability but show convergence rate, especially on unimodal functions. In this paper, an improved artificial bee colony algorithm is introduced to shorten its computation time. In the proposed algorithm, two novel update equations, utilizing the social experience of the whole population, are proposed to boost the performance of employed bees and onlooker bees respectively. The effectiveness of our algorithm is validated through the basic benchmark functions. Furthermore, a model of feed-forward artificial neural network is also employed to verify the effectiveness of our algorithm. The experimental results show the IUABC algorithm achieves better performance than the other compared algorithms.

14:40
Information transmission system based on visual recognition between internal and external networks under physical isolation
PRESENTER: Fengyi Li

ABSTRACT. With the rapid development of the Internet, the use of the Internet to carry out work has become an irreversible trend. How to realize data transmission between internal and external network devices in a physical isolation state is an important issue currently facing. Two-dimensional codes can be used to identify a variety of text and image information, and can be used as the main carrier of information exchange. This article uses the text encoding function of the two-dimensional code and the technology of file cutting to propose a data transmission system based on visual recognition that physically separates the internal and external networks. At the sending end, the segmentation of the file to be transmitted is completed, and multiple files and corresponding two-dimensional codes are generated. After the receiving end scans and recognizes the two-dimensional code, a complete file is formed that is the same as the original file at the sending end, which can meet the needs of daily file transmission between internal and external networks. Upgrade the function of the receiving end of the system: According to the instruction text sent by the sending end, the local pictures or information in the database can be retrieved, and then displayed on the screen or sent back to the sending end computer.

14:50
Bayesian Estimation of the Inverted Beta-Liouville Mixture Models with Extended Variational Inference
PRESENTER: Wenbo Guan

ABSTRACT. This paper addresses the problem of the Bayesian estimation of the inverted Beta-Liouville mixture model (IBLMM), which has a fairly flexible positive data modeling capability. This problem does not usually admit an analytically tractable solution. Sampling approaches (e.g., Markov chain Monte Carlo (MCMC)) can be used to address this problem. However, these approaches are usually computationally demanding, and as a result, they may be impractical for real-world applications. Therefore, we adopt the recently proposed extended variational inference (EVI) framework to address this problem in an elegant way. First, some lower bound approximations are introduced to the evidence lower bound (ELBO) (i.e., the original objective function) in the conventional variational inference (VI) framework, which yields a computationally tractable lower bound. Then, we can derive a form-closed analytical solution by taking this bound as the new objective function and optimizing it with respect to individual variational factors. We verify the effectiveness of this method by using it in two real applications, namely, text categorization and face detection.

15:00-18:00 Session 9: Poster Session1
Chairs:
The Data Sharing Security System of Cloud Storage

ABSTRACT. Because of the openness of the cloud storage architecture and the sharing of resources, data owners have lost control of the stored data, leading to frequent leakage of user privacy data, and security issues have been a significant element restricting the development of cloud storage. In this paper, three key algorithms for key distribution, hybrid encryption and dynamic key update are proposed for the key issues of data sharing security in cloud storage environment. Based on the above algorithms, a data security sharing model in cloud storage environment is proposed to solve cloud storage Trust dependence, user collusion attacks, and data dynamic security issues in the environment to protect private data during storage and sharing security. The analysis of the experimental results shows that the data security sharing technology can resist selective plaintext and collusion attacks. Therefore, the system of cloud storage en-crypts shared data by using data security sharing technology, which can effectively protect the data confidentiality under context of the cloud storage.

The Spatial Data Sharing Security Model Based on Hybrid Encryption Algorithm

ABSTRACT. Hybrid encryption algorithms are flexible tools for modeling the correlation of random variables. They cover the scope from completely negative correlation to positive correlation, including independent cases and contain asymmetric correlation and broadly employed Gaussian correlation structure. The pair-encryption algorithm of the hybrid encryption algorithm takes advantage of the ease of use of the two-variable encryption algorithm, and it is recommended to decompose the hybrid encryption algorithm into a set of two-variable encryption algorithms. We have successfully applied this method to spatial data and established a powerful interpolation method on basis of spatial logarithm.

Image reconstruction for electrical capacitance tomography using improved compressed sensing algorithm

ABSTRACT. ECT (Electrical capacitance tomography) has great application potential in process monitoring, and its visualization results are of great significance for studying the changes in two-phase flow in closed environments. In this paper, dictionary learning replaces the priori orthogonal basis in CS (Compressed Sensing) theory for sparse representation of pixel signal. Because the trained overcomplete dictionary is ability to match few features of interest in the reconstructed image of ECT, it is not necessary to rely on the sparsity to solve the nonlinear mapping. Two-phase flow distribution in a pipe was simulated, and three variations without sparse constraint based on Landweber, Tikhonov, and Newton-Raphson algorithms were used to reconstruct a 2-D image.

GRAPH REGULARIZED BAYESIAN TENSOR FACTORIZATION BASED ON KRONECKER-DECOMPOSABLE DICTIONARY

ABSTRACT. The tensor factorization techniques based on robust principal component analysis (RPCA) have achieved good performances in image processing and video analysis applications. It preserves the intrinsic structure information better and has a better low-dimensional subspace recovery performance than the matrix factorization technique. However, the outliers (noise) on each mode are usually non-independent and identically distributed (non-i.i.d) due to the real world data. In this paper, we propose a Bayesian tensor factorization based on a Kronecker-Decomposable sparse dictionary learning (KDSDL) model. For 3D tensor, non-i.i.d mixture of outliers are embedded in the KDSDL model, and a novel graph sparsity-inducing regularization that captures the correlation on different modes. The experiments on complex noises and video data demonstrate the superiority of the proposed methods over the existing state-of-the-art methods.

Spatiotemporal Analysis of Traffic Scene Layout using Scene Stages

ABSTRACT. The spatiotemporal analysis of road image sequences has been a hot topic in the community of intelligent transportation systems. The previous methods for traffic scene layout estimation are based on structured learning. In this paper, we propose a bottom-up layout analysis method based on a hierarchical spatio-temporal model using hidden conditional random fields (HCRF). Firstly, the bottom-level features are extracted from sub-regions. The local and global features of the image sequence are then fully combined for spatial and temporal layers. Finally, a hierarchical model is obtained with strong discriminative ability. The experimental results demonstrate that the proposed method shows strong discriminative ability than the state-of-the-art method on TSD-max dataset.

Bear Fault Diagnosis Based on Multiscale Feature Learning

ABSTRACT. For vibration signals measured at different speeds or loads, the different states of bearings will have a considerable internal variability, which further increases the difficulty of extracting fault signal consistency features. We believe that in order to improve the performance of feature learning and classification, a more comprehensive and extensive extraction and fusion of signals is needed. However, existing multiscale multi-stream architectures rely on contacting features at the deepest layers, which stack multiscale features by brute force but do not allow for a complete fusion. This paper proposes a novel multiscale shared learning network (MSSLN) architecture to extract and classify the fault feature inherent in multiscale factors of the vibration signal. The merits of the proposed MSSLN include the following: 1) multi-stream architecture is used to learn and fuse multiscale features from raw signals in parallel. 2) the shared learning architecture can fully exploit the shared representation with the consistency across multiscale factors. These two positive characteristics help MSSLN make a more faithful diagnosis in contrast to existing single-scale and multiscale methods. Extensive experimental results on Case Western Reserve dataset demonstrate that the proposed method has high accuracy and excellent generalization.

The Correlation Analysis Model of Information Security Events Based on the Adaptive Optimization Algorithm

ABSTRACT. As sophisticated network attack methods increase, the decentralized security event processing methods based on a single device can no longer meet the cur-rent needs of network security management. The security event correlation analysis technology analyzes the various security events through correlation that can accurately judge and extract meaningful security events. This paper proposes an information security event correlation analysis method based on the adaptive optimization algorithm in order to imitate the constitution of clusters in two-dimensional gas where particles do not keep still until they irreversibly hit and "stick" together. The simulation is established to analyze the aggregation of information security events.

An Improved Bi-directional Maximal Matching Algorithm based on Optimized Word Connection

ABSTRACT. In this paper, we proposed a novel word segmentation algorithm: Bi-directional Maximal Matching algorithm based on Optimized Word Connection (BiMM-OWC). Based on the idea of sub-dictionary divisions, this algo-rithm can realize the fast word matching. By introducing single word connection, the accuracy of the algorithm is also improved. The selection strategy of Forward Maximal Matching (FMM) or Backward Maximal Matching (BMM) can realize the word automatically matching under different conditions. We take the clinical symptom text as an instance to verify the accuracy and efficiency of this algorithm. Comparing with five representative algorithms, we show that the BiMM-OWC algorithm can achieve the best performance of word segmentation for the symptom text. Specifically, the proposed method has around 25% in precision and 37% in running time improved to the Bi-directional Maximal Matching (BiMM) algorithm.

DoS-resisting two-factor remote authentication and key exchange scheme with user anonymity for mobile communication

ABSTRACT. Wireless communication and cloud are extremely popular because of its convenience, portability and immediacy. As the first line of defense for mobile communication with intelligent devices, identity authentication plays a more and more important role for system security and communication privacy. In order to enhance the security and privacy of wireless communication, this paper investigates an anonymous password-based remote user authentication and key exchange scheme for intelligent devices. Firstly, the scheme achieves secure mutual authentication while preserving user privacy. Secondly, client puzzle is employed to resist denial of service attacks and establish session keys with forward secrecy. Finally, the analysis demonstrate that our scheme works effectively and resists common known attacks, while maintaining the efficient performance for mobile users.

A New Biometric-based, Mutual Authenticated Certificateless Key Agreement Protocol for Telecare Medical Information System

ABSTRACT. Telecare medicine information system (TMIS) can provide remote users with high-quality services at home. Hence, it is not easy to ensure the user information security in the complex network environment. This paper proposes a new biometric-based, mutual authenticated certificateless key agreement protocol for TMIS to protect user information security. The protocol is based on the difficulty of the discrete logarithm problem under the extended Canetti-Krawczyk (eCK) security model. And the protocol uses biological imprinting to provide more security for users. Meanwhile, the protocol can resist all kind of existing attack. Compared with the existing key agreement protocol, the protocol has higher computing and communication efficiency.

Innovation of heavy industry enterprise management model

ABSTRACT. This paper studies that the problems of the new perceived incentive system upgrade and management funding input introduced by the company, which establishes a production function model, and observes whether the introduction of the new management system can effectively help traditional mining enterprises Improve production efficiency and production safety factor. Through empirical research, this paper finds that the effective improvement of the enterprise management system and incentive system can improve the production efficiency of the enterprise, increase the participation of workers and promote the effective implementation of the project. Finally, based on the conclusions of empirical research, this paper further verifies the reliability of the data and gives relevant suggestions.

Multi-scale underwater object tracking by adaptive feature fusion

ABSTRACT. Different from object tacking on the ground, underwater object tracking is challenging due to the image attenuation and distortion. Also, challenges are increased by the high-freedom motion of targets under water. Target rotation, scale change, and occlusion significantly degenerate the performance of various tracking methods. Aiming to solve above problems, this paper proposes a multi-scale underwater object tracking method by adaptive feature fusion. The gray, HOG (Histogram of Oriented Gradient) and CN (Color Names) features are adaptively fused in the Background-Aware Correlation Filter (BACF) model. Moreover, a novel scale estimation method and a high-confidence model update strategy are proposed to comprehensively solve the problems caused by the scale changes and background noise influences. Experimental results demonstrate that the success ratio of the AUC criterion is 64.1% that is better than classic BACF and other methods, especially in challenging conditions.

Review of the Online Fault Diagnoses Methods for the Power Transmissions

ABSTRACT. In order to realize the fault diagnoses of the power transmissions, this paper first introduces the online monitoring methods by using sensors. Then, the traveling wave method and the Wavelet method for the fault locations of the transmission lines are summarized according to the real-time operation data and the environmental variation data. Furthermore, the neural networks and the genetic algorithms for fault recognition are summarized according to the big data. Finally, this paper discusses the principles, the advantages, and the shortcomings of the different methods, which establishes the foundations for the intelligent diagnoses of the power transmissions.

Real-time research on functional distributed computer network interconnection

ABSTRACT. At present, functional distributed computer network interconnection is an important component of LAN applications. For example, regional power grid monitoring centers, aerospace survey ship center systems, and airport ground monitoring systems mostly use this model. It is conceptually different from the distributed computer system of the organization. Each connected computer is still autonomously operated, and it is only functionally distributed. Therefore, the traditional LAN interconnection method such as Ethernet is still adopted in the structure, which is different from the traditional one. In the local area network, the interconnected microcomputers generally have no affiliation and are functionally equivalent.

Review of The Intelligent Diagnosis Methods for the Power Transmission Lines

ABSTRACT. Transmission line is an important part of the power system.Transmission line is often overwhelmed by mountains and ridges,and is extremely susceptible to malfunctions caused by lightning, pollution, and icing.Therefore,the identification of the cause of the transmission line fault and the fault location have become a major challenge for the fault diagnosis of the transmission line.This article combs and summarizes the research on fault diagnosis of the transmission lines.Taking transmission line fault diagnosis as the research object, firstly, the research value and existing challenges of the transmission line fault location are analyzed.Then sorted out the development background of current research work and the advantages and disadvantages of various diagnostic methods. Finally, the future research work to be carried out is prospected.

Retinex based Underwater Image Enhancement using Attenuation Compensated Color Balance and Gamma Correction

ABSTRACT. Underwater image processing plays an important role in underwater object recognition, underwater target detection, marine exploration and other fields. Underwater image is an important carrier of marine information. However, underwater images suffer from low contrast, color degradation, uneven illumination and detail loss due to the absorption and scattering effects of light in the underwater propagation process and different attenuation degree of different wavelengths. In this paper, we propose an underwater image enhancement method based on Retinex variational framework. Firstly, we use bilateral filter instead of the traditional gaussian filter to decompose the original image into reflectance image and illumination image. Then, we design an attenuation map guided gray world method to compensate for color loss and use gamma correction to improve the illumination image. Finally, the enhanced underwater image is obtained by multi-scale fusion and contrast limited adaptive histogram equalization. Qualitative and quantitative performance analysis prove that the proposed method has better performance than the existing underwater image enhancement methods.

Image target classification detection

ABSTRACT. With the rapid development of social networks and the increasing use of mobile devices, the data scale of digital images has increased sharply, and the dynamic detection of object categories has gradually become a research hotspot in the field of computer vision. The key problems of dynamic detection of object categories are summarized. First introduces the research background of the target category, then the target object type detection techniques were reviewed, including the source code to compile, functional testing, model training and model verification points, four core technology and the training of different data sets and evaluation standard, the final object categories listed dynamic detection algorithm of test results, and summarize the main research of target category test difficulty and developing direction in the future.

Pelvic Segmentation Based on MultiR2UNet

ABSTRACT. For MRI images of pelvis, it is helpful for doctors to extract the structure of pelvis quickly and accurately. The doctor can diagnose and analyze the disease in the pelvic area in time. Extracting skeletal contour from MRI images of pelvis is not only time consuming but also low precision. Therefore, this paper proposes an image segmentation algorithm based on MultiR2UNet. We adopted R2UNet, which is more accurate in the segmentation field, as the backbone network. The residual connection is used in the network hopping layer, and the MultiRes Block is used in the up-sampling, which is beneficial to increase the depth of the network and extract more detailed features. Due to the small number of pelvis training samples and the imbalance of samples, we performed data enhancement in the data preprocessing stage. The data samples were effectively amplified. In the training phase, we propose to use the mixed loss function. After several times of training and detection, the gap between the pelvis section segmentation by the algorithm in this paper and the real label is small, and their coincidence degree can reach about 91%. The average segmentation time for each image was about 0.012s. The experimental results show that the proposed algorithm can guarantee the segmentation accuracy. It's faster than a doctor. MultiR2UNet is an effective real-time pelvis segmentation algorithm.

Oracle Bone Inscriptions Multi-modal Knowledge Graph Construction and Applications
PRESENTER: Jing Xiong

ABSTRACT. To solve the problems of the great learning difficulty, long learning period, wide range of knowledge points but weak knowledge connection, and low sharing of Oracle Bone Studies (OBS), Artificial Intelligence (AI) techniques are introduced into the traditional study of OBS. We have collected and organized over 120 years of basic Oracle Bone Inscriptions (OBI) data and academic literature to build the OBI big data platform. Based on this platform, we proposed the OBI information processing application pyramid model, which aims to realize the digitization, datafication and intelligence of OBS. And the basis of intelligence is OBS multimodal knowledge graph. The OBS multi-modal knowledge graph can provide a unified semantic space for multi-source heterogeneous data. Through multi-modal fusion and information complementation, the defects of a single modal in OBI information processing can be solved. This multi-modal knowledge graph organizes and manages the basic data better to serve OBI information processing research. Such as OBI detection and recognition, computer aided oracle-bones joint and knowledge question answering assistant. Taking OBI detection and recognition as examples, we studied the applications of the OBS multi-modal knowledge graph. The experimental results showed that the proposed method reached 81.3% accuracy in detection and 80.43% accuracy in recognition, and it had 3.7% in detection and 14.8% in recognition improved to the conventional methods.

Design and Realization of Performance Test of Video Apps Based on Android

ABSTRACT. In this paper, test scenarios were designed to monitor the main performance indexes of three common video Apps by starting from the studies on performance indexes of mobile APPs and combining the characteristics of video APPs. It aimed at providing references for related studies and users through the analysis of test results. A comparative test on the performance of iQIYI, BiliBili and QQlive was conducted from CPU, memory, GPU and data consumption, using automated test method, as well as PerfDog, iTest and ADB tools. According to the test results, three video APPs had excellent performance in CPU utilization, being far below the expected value. As for GPU, BiliBili had the highest video quality, occupied the least memory and consumed the least data. iQIYI has the worst performance in frame rate, memory occupation and data consumption.

Appearance-based Gaze Estimation with Multi-Modal Convolutional Neural Networks

ABSTRACT. Existing methods on appearance-based gaze estimation mostly regress gaze direction from eye images, neglecting facial information and head pose which can be much helpful. In this paper, we propose a robust appearance-based gaze estimation method that regresses gaze directions jointly from human face and eye. The face and eye regions are located based on the detected landmark points, and representations of the two modalities are modeled with the convolutional neural networks (CNN), which are finally combined for gaze estimation by a fused network. Furthermore, considering the various impact of different facial regions on human gaze, the spatial weights for facial area are learned automatically with an attention mechanism and are applied to refine the facial representation. Experimental results validate the benefits of fusing multiple modalities in gaze estimation on the Eyediap benchmark dataset, and the propose method can yield better performance to previous advanced methods.

Visual relation of interest detection based on part detection

ABSTRACT. Visual relation detection (VRD) aims to describe images with relation triplets like <subject, predicate, object>, paying attention to the interaction between every two instances. To detect the visual relations that express the main content of a given image, visual relation of interest detection (VROID) is proposed as an extension of the traditional VRD task. The existing methods related to the general VRD task are mostly based on instance-level features and the methods that adopt detailed information only use part-level attention or human body parts. None of the existing methods take advantage of general semantic parts. Therefore, we further propose an interest propagation form part (IPFP) method on the basis of the IPNet for VROID,to detect visual relations of interest, which propagates interest along “part-instance-pair-triplet”. The IPFP method consists of four modules, Panoptic Object-Part Detection (POPD) module, Part Interest Prediction (PartIP) module, Instance Interest Prediction (InstIP) module, and Predicate Interest Prediction (PredIP) module. The POPD module extracts instances with instance features and instance parts with part features. The PartIP module predicts interest for every single part and interest for a pair of parts belonging to different instances, and the InstIP module predicts interest for both single instance and pair of instances. The PredIP module predicts possible predicates for each instance pairs. The interest scores of visual relations are the product of pair interest scores and predicate possibilities for pairs. We evaluate the performance of the IPFP method and the effectiveness of important components using the ViROI dataset for VROID.

Research on Stability and Accuracy of the OptiTrack system based on Mean Error

ABSTRACT. OptiTrack system was the most advanced optical motion capture system in recent years. Camera calibration was an indispensable part of optical motion capture. Its function was to convert 2D data obtained by each camera into spatial 3D data by multiple cameras. The mean error reflected the numerical relationship between the system coordinates and the world coordinates of the measurement area, so the mean error was related to the measurement accuracy of OptiTrack system directly. In this paper, the influence of average calibration error on measurement stability and accuracy was obtained by repeated calibration and sampling under control variables.

Using VIVE Tracker to Detect the Trajectory of Mobile Robots

ABSTRACT. The detection of mobile robot performance often requires expensive high-precision trajectory tracking equipment. Testing laboratories are often unable to afford such a price. Finding cheap equipment that can meet the detection requirements has become a problem that needs to be solved. HTC's VIVE Tracker As a low-cost VR product accessory, it provides good measurement accuracy. This research focuses on the VIVE Tracker's performance testing for mobile robots, focusing on the device's ability to capture and analyze the movement trajectory of the mobile robot. After testing and analyzing the VIVE Tracker, it is verified that it can meet the accuracy requirements of mobile robot detection, and can well capture the movement trajectory of the mobile robot and use it for the mobile performance analysis of the mobile robot.

An Improved Transfer Learning with EMD for Parkinson’s Diagnosis

ABSTRACT. As we known, the life of patients with Parkinson's disease (PD) which cannot be cured fundamentally has changed thoroughly. Then, automatic identification of early Parkinson's disease on feature data sets attracts many medical researchers. At present, machine learning especially deep learning algorithms have been widely adopted in the task of classification and regression, etc. But labeled data sets are rare and expensive to label in many areas, i.e., aerospace, medical. Transfer learning is often employed to solve the problems with small training dataset. In this paper, we proposed a parameters-based transfer learning algorithm to enhance generalization ability and avoid overfitting of the network. Then a new method is utilized to accelerate the training speed of the network, which help the algorithm to achieve results with high speed. At last, the Earth Mover’s Distance (EMD) is introduced into our proposed transfer learning algorithm for enhancing the precise of measurement which represents as a distance metric between the two probability distribution of images. The experimental results compared with other modern algorithms on the common Parkinson’s datasets show the effectiveness of our algorithm.

Research on the Technology of Optical Motion Capture System for Robot Inspection

ABSTRACT. Optical motion capture system has been widely used in measurement and control of biomechanics, medical treatment, UAV control and other fields. There have been experimental studies on the accuracy and accuracy of the optical motion system, but there is no study on the layout of the optical camera in the detection area to determine the detection volume boundary and calibrate the accuracy stability of the system within the detection volume. In this study, an optical motion capture system composed of eight OptiTrack Prime 13 cameras with equal spacing and single rings in the determined space was designed. According to the size of the detected object, the camera's angle of vision was adjusted through calculation to determine the maximum detection volume, and the system was calibrated within the detection volume to improve the calibration accuracy.The layout data of the optical motion capture system can be quantified to make the system test data stable and reproducible, and keep the consistency of the robot test parameters.

Robust Texture Retrieval Based on Multivariate Log-Gaussian Mixture Model

ABSTRACT. This paper proposed an efficient robust texture retrieval method by addressing heavy-tailed distribution of texture. In the proposed scheme, multivariate Log-Gaussian mixture model (MLGMM) was used to model the sharp peaks, heavy tails, and even multimodal statistical property of two-dimension Gabor coefficients of texture under different scales and orientation. The parameters of MLGMM are estimated by expectation maximum algorithm (EM). In our schemes, each class of texture is modeled by one MLGMM and Bayesian classification is implemented by feeding the output of MMLGM into Bayesian classifier. Experiments on feature extraction and similarity measurement have been done to demonstrate the effectiveness of our proposed algorithms. Extensive experiments have validated that our retrieval scheme achieves a higher average retrieval rate compared to related well-known texture statistical techniques.

An Improved Heterogeneous Graph Convolutional Network for Inter-relational Medicine Representation Learning

ABSTRACT. Medicine representation learning which aims at exploring the hidden medicine relations has become a significant technique to imitate the doctor's cognitive reasoning process. Most existing studies focus on the intuitive relationships between medication and diagnosis, however, ignore the inherent properties of medicines. This paper explores the related knowledge underlying the clinical medication combining with a heterogeneous graph convolutional network (HGCN) and spectral clustering (SC) algorithm. Based on the chronic obstructive pulmonary disease (COPD) data, HGCN is utilized to infer the internal relations among medicines and their properties and generalize the medicine embeddings. Moreover, the SC is introduced to divide the medicine embeddings into their corresponding syndromes. After comparing with three baseline models and their variants on six evaluation metrics, the experimental results demonstrate that this HGCN-SC model achieves superior performance in medicine combination identification of COPD than the baseline methods and has around 3.0% improved in the accuracy to the SC.

A database of students’ spontaneous actions in the real classroom environment

ABSTRACT. Students’ actions in the classroom play a key role in studying student performance in class. With the development of computer vision technology, automatic recognition of student action has become possible. This study focuses on the automatic recognition of students’ spontaneous actions in the real classroom environment. Considering the lack of data about students’ spontaneous actions, this study first establishes a database of students’ spontaneous actions in the real classroom environment. Our database consists of 4,917 images of students’ spontaneous actions, including the actions of students in 10 learning states, such as raising hand, standing up, taking notes, clapping, taking photo, looking up, holding cheek, playing mobile phone, stretching and laying on the desk. Based on the characteristics of students’ actions in the classroom, we proposed a new 11-layer convolutional neural network algorithm based on EDSR. In this algorithm, in view of the problem that convolutional neural network is easy to overfit small sample data, a data augmentation method is introduced for data processing. The experiment result shows that the proposed algorithm improved the accuracy of action recognition effectively on the database established in this paper. And such a spontaneous database proposed in this paper provides a good data support for the study of students’ actions in the classroom in the field of education.

RLGC: Residual Low Rank Group Sparsity Constraint for Image Denoising

ABSTRACT. Image denoising is an important topic in the field of image processing. With the application of nonlocal similarity in sparse representation, the work of image denoising began to be performed on similar patch groups. The sparse representations of patches in a group will be learned together. In this paper, we propose a novel image denoising model by combining group sparsity residual with low-rankness. Firstly, motivated by the relationship between low rank and sparsity, a low rank constraint is imposed on the sparse coefficient matrix of each similar patch group to enhance the sparsity. Secondly, since γ-norm can most closely match the true rank of a matrix, it is applied for rank approximation in our model. Finally, in view of the fact that numerous iterations are required in the group sparse representation (GSR) model, we develop an efficient algorithm based on the Majorize-Minimization (MM) optimization. It greatly reduces the computational complexity and the number of iterations. Experimental results show that our model makes great improvements in image denoising and outperforms many state-of-the-art methods.

Multi-Level Query Interaction for Temporal Language Grounding

ABSTRACT. Temporal language grounding aims to localize the desired moment in an untrimmed video with a given sentence query that is relevant to the moment. To tackle this problem, all the previous methods often learn the word-level or phrase-level features of the query, or directly generates the global sentence representation by attention mechanisms or graph network. However, we argue that applying only word-level or phrase-level semantic information and cross-modal interactions is not enough to fully capture the correspondence between the video and the query. To this end, we proposed a novel Multi-level Query Exploration and Interaction (MQEI) model, which explores the semantics in both the word- and phrase-level and captures the multi-level interactions between the video and the query through an attention module. Extensive experiments on two public benchmark datasets ActivityNet Captions and Charades-STA demonstrate that the proposed model can outperform all the state-of-the-art methods consistently.

A Review of Natural Language Processing for Financial Technology

ABSTRACT. In the past few years, the development of natural language processing has been able to deal with many issues such as emotional analysis, semantic analysis, and so on. This review first introduces the development of natural language processing, and then summarizes their applications in financial technology, which mainly focuses on public opinion analysis, financial prediction and analysis, risk assessment, intelligent question answering, and automatic document generation. The analysis shows that natural language processing can give full play to its advantages in the financial field. Moreover, this paper also discusses the problems and challenges for financial technology that are developed based on natural language processing. Finally, this paper presents two developing trends of natural language processing in financial technology: deep learning and knowledge graph.

Multi-task infrared pedestrian detection method

ABSTRACT. Pedestrian detection in infrared images often suffers from two problems, i.e., 1) the weak features of infrared images result in false alarms; 2) the generalization ability of infrared pedestrian detection methods is not satisfactory since the infrared images are similar due to the limited the acquisition method. To solve these problems, we proposed a multi-task infrared pedestrian detection method. Firstly, the U-Net segmentation network is used to predict the pedestrian activity area, and the detected objects in non-pedestrian parts are filtered out to reduce the false alarm. Secondly, the domain adaptation is introduced to align the feature of infrared images and visible light images, by which visible light images are used as additional data to improve scene diversity and generalization ability. The experiment results show that, Compared with EfficientDet, our method improved the average precision (AP) by 1.4% on the XDU-NIR2020 dataset and 1.9% on CVC-09 dataset.