previous day
all days

View: session overviewtalk overview

09:00-11:00 Session 2: Award Session (This program is scheduled in Japan Standard Time Zone)
ASDN: A Deep Convolutional Network for Arbitrary Scale Image Super-Resolution

ABSTRACT. Deep convolutional neural networks have significantly improved the peak signal-to-noise ratio of Super-Resolution (SR). However, image viewer applications commonly allow users to zoom the images to arbitrary magnification scales, thus far imposing a large number of required training scales at a tremendous computational cost. To obtain a more computationally efficient model for arbitrary-scale SR, this paper employs a Laplacian pyramid method to reconstruct any-scale high-resolution (HR) images using the high-frequency image details in a Laplacian Frequency Representation. For SR of small scales (between 1 and 2), images are constructed by interpolation from a sparse set of precalculated Laplacian pyramid levels. SR of larger scales is computed by recursion from small scales, which significantly reduces the computational cost. For a full comparison, fixed- and any-scale experiments are conducted using various benchmarks. At fixed scales, ASDN outperforms predefined upsampling methods (e.g. SRCNN, VDSR, DRRN) by about 1 dB in PSNR. At any-scale, ASDN generally exceeds Meta-SR on many scales.

Robust Motion Averaging under Maximum Correntropy Criterion

ABSTRACT. Recently, the motion averaging method has been introduced as an effective means to solve the multi-view registration problem. This method aims to recover global motions from a set of relative motions, where the original method is sensitive to outliers due to using the Frobenius norm error in the optimization. Accordingly, this paper proposes a novel robust motion averaging method based on the maximum correntropy criterion (MCC). Specifically, the correntropy measure is used instead of utilizing Frobenius norm error to improve the robustness of motion averaging against outliers. According to the half-quadratic technique, the correntropy measure based optimization problem can be solved by the alternating minimization procedure, which includes operations of weight assignment and weighted motion averaging. Further, we design a selection strategy of adaptive kernel width to take advantage of correntropy. Experimental results on benchmark data sets illustrate that the new method has superior performance on accuracy and robustness for multi-view registration.

Competitive Multi-Swarm Optimizer with Dynamic Grouping for Large-Scale Optimization

ABSTRACT. In this paper, we propose a novel competitive multi-swarm optimizer with dynamic grouping (CMSO-DG) for large-scale optimization problems. CMSO-DG is inspired by the biological migration in nature, and applies a migration mechanism among subpopulations to enhance its diversity. We also introduce a dynamic grouping strategy in each subpopulation and a dynamic parameter adjustment in each iteration of the evolution of CMSO-DG. The purpose of these operations is to ensure that CMSO-DG can preserve a high diversity in the early stages of optimization, and can quickly locate the global optima in the later stages. A series of experiments, conducted on the CEC'2010 and CEC'2013 benchmark functions, show that CMSO-DG achieves good balance between exploration and exploitation, and exhibits a superior performance than several state-of-the-art evolutionary algorithms (EAs).

Representation Separation Adversarial Networks for Cross-Modal Retrieval

ABSTRACT. Cross-modal retrieval aims to search the semantically similar instances from the other modalities by giving a query from one modality. Recently, generative adversarial networks (GANs) has been proposed to model the joint distribution over the data from different modalities and to learn the common representations for cross-modal retrieval. However, most of existing GANs-based methods simply project original representations of different modalities into a common representation space, and ignore the fact that different modalities share the common characteristics and on the other side each modality has the individual characteristics.To address this problem, in this paper, we propose a novel crossmodal retrieval method, called Representation Separation Adversarial Networks (ReSAN), which explicitly separates the original representations into common latent representations and private representations. Specifically, we minimize the correlation between the common representations and private representations to ensure independence of them. Then, we reconstruct the original representations via exchanging the common representations of different modalities to encourage the information swap. Finally, the labels are utilized to increase the discriminant of common representations. Comprehensive experimental results on two widely used datasets show that the proposed method achieved better performance than many existing GANs-based methods, and demonstrate that explicitly modeling the private representation for each modality can improve the model to extract common latent representations.

11:00-18:20 Session 3: Oral Session (This program is scheduled in Japan Standard Time Zone)
Coal Dust Image Recognition Based on Improved VGG Convolution Network

ABSTRACT. An improved image segmentation model of vgg-16 convolutional neural network (vgg-16) was proposed to realize the segmentation of coal dust image under micrograph. Based on the vgg-16 network model, this model adds SELayer to the first two convolution modules, replaces the SoftMax classifier in the original vgg-16 network with a binary classifier, optimizes the model parameter structure and parameters, and learns to share the weight parameters of the convolution layer and the pooling layer in the pre-training model through micro-migration. Samples were randomly selected from the constructed coal dust image as training set and test set to test the performance indexes of the model. The experimental results show that the model can effectively identify the characteristic image of coal dust, and the recognition accuracy is up to 90%.The model has good recognition performance in the field of coal dust treatment, and can realize accurate recognition of microscopic coal dust image.

Rotational anchor mechanism for multi-oriented scene text detection
PRESENTER: Wenjun Yang

ABSTRACT. In recent years, multi-oriented scene text detection has attracted much attention due to its critical role in real-time visual applications, such as intelligent transportation systems, travel translation software, and blind visual-aid systems. Most of the existing text detection methods are for horizontally arranged texts. However, multi-oriented texts are commonly exist in natural scene images. In this paper, we propose a multi-scale rotational anchor mechanism to detect multi-oriented texts, Our method consists of four main components: convolution feature map generation via a convolution neural network, multi-oriented text segments generation using multi-scale rotational anchor, text character sequence context features extraction through a bidirectional long short-term memory network, text line construction using the text segment context relationships. Our method can be applied to natural scene images that contain multi-oriented text and multi-language text. We evaluated our method using ICDAR2013 and ICDAR2015 datasets, achieving 82.0\% and 78.9\% F-scores, respectively. These results demonstrate that our approach outperforms existing text detection methods.

Research on Task Assignment for Unmanned Aerial Vehicles Swarm

ABSTRACT. The task assignment for unmanned aerial vehicle (UAV) swarm has the characteristics of multiple targets, multiple missions, and multiple constraints, which causes the prohibitive computational complexity. In order to solve the issue, introducing the swarm intelligence of grey wolves hunting, we propose a UAV swarm task assignment method based on the grey wolf optimization. First, we construct a multiple task assignment model with a dual objective function. Second, we design the layered encoding scheme to establish the task assignment matrix. Then, we present the archive-shared strategy to store the Pareto optimal sets for the swarm. Finally, we develop the matching rule to achieve the final assignment result. The simulation results demonstrate that, in terms of the task assignment rate, the proposed algorithm has approximately 21% and 20% improved to multi-objective grey wolf optimization algorithm and multi-objective particle swarm optimization algorithm, respectively. Besides, it has better convergence than other algorithms for the large scale assignment problems.

TumorGAN: A Multi-modality Data Augmentation Framework for Brain Tumor Segmentation

ABSTRACT. While deep learning is widely used for medical image processing like tumor segmentation tasks, paired medical image collection is still a tough work. Collection of multi-modality data pairs would be further challenging. Insufficient image pairs restrict the development of medical image segmentation. To relief the stress of medical data labeling, we propose a novel framework called TumorGAN which can generate tumor image segmentation pairs based on unpaired adversarial training. Furthermore, we include a regional perceptual loss to enhance the performance of discriminator. Moreover, we develop a regional $l_{1}$ loss to constraint the color of the brain tissue. Finally, we verify the performance of our TumorGAN on a public brain tumor dataset BraTS 2017. Experimental results show that The synthetic data pairs generated by our TumorGAN can practically improve the tumor segmentation performance when applied to segmentation network training.

Entity Alignment via Knowledge Embeddings and Type Matching Constraint

ABSTRACT. Entity Alignment (EA) is the process of discovering entity pairs that represent the same real object in different Knowledge Graphs (KGs). Currently, mainstream EA methods disregard the semantic structure of an entity and the surrounding entities implied in a KG and mainly rely on the vector representation of the entity. In this paper, we propose a new EA framework based on knowledge embeddings (KEs) and a type matching constraint. By combining the similarity of an entity vector and the degree of the entity type matching, we can perform a more accurately cross-KG EA. The experimental results show that compared to the with existing methods, the proposed method can significantly improve the accuracy of EA. It can also be extended to other methods based on embedding.

OnlineLSTM: Incremental Ensemble LSTM Model

ABSTRACT. The neural network often adopts an off-line batch mode for model training. To be updated with the information from new data, the network has to be retrained with merged data with new and old data. It is very time-consuming and causes catastrophic forgetting. To address this, we propose an incremental ensemble LSTM model--OnlineLSTM, which fuse ensemble learning and transfer learning to implement parallel training and incremental updating of the model by taking the powerful nonlinear fitting ability of the LSTM. The LSTM is employed to fit the time series data efficiently dynamically. This paper also designed efficient network training optimization techniques: sample-based and model-based transfer learning algorithms and weak-learner buffer pool algorithm, etc. The experimental results prove that the proposed methods are efficient and effective for model training.

Fast and Accurate Online Sequential Learning of Respiratory Motion with Random Convolution Nodes for Radiotherapy Applications

ABSTRACT. Accurate prediction of respiratory motion is essential for motion adaptive radiotherapy applications with regards to the positioning lag associated while tracking the tumor motion. Despite having a plethora of works on respiratory motion prediction, present-day tracking capabilities are prone to errors that are mainly caused by the non-stationary nature of respiratory motion as well as the inter- and intra-trace variabilities. As a solution to address this issue, re-training of prediction model at regular intervals is proposed. This solution however demands a trade-off between the re-training interval and prediction accuracy of future tumor location, re-training with small interval increases the computational requirements whereas with larger intervals hamper the prediction performance due to the complex nature of respiratory motion. To address these issues, in this paper, a prediction model that relies on random convolution nodes (RCN) governed by local receptive fields (LRFs) is proposed for respiratory motion prediction. The innate nature of LRFs extracts the features that contribute to the local-patterns as well as the non-stationary patters in recent samples and subsequently learn them using extreme learning machine (ELM) theories. To address the re-training issue, we proposed a sequential learning framework that can update the model parameters at regular intervals, named as OS-fRCN. Suitability of the proposed OS-fRCN for respiratory motion prediction is evaluated on 304 real respiratory motion traces. A comparison analysis with existing prediction methods at four prediction horizons in-line with the commercially available radiotherapy systems is conducted to analyze the robustness of the proposed method for inter- and intra-trace variabilities of respiratory motion. Results demonstrated that the proposed OS-fRCN requires less computational complexity and yields robust, accurate prediction performance when compared to its re-trained counterpart and existing respiratory motion prediction methods.

Maximum Correntropy Information Filter for Linear non-Gaussian Systems

ABSTRACT. This paper concerns the information filtering problem for the linear non-Gaussian systems. Under the maximum correntropy criterion (MCC), an iterative updating information filter is proposed by using the fixed-point theory and is abbreviated as MCIF. The system state is firstly predicted as the same way of traditional information filter to obtain the prior prediction of the state and its one-step prediction information matrix. Then, an iterative updating approach is presented to calculate the state estimates. The iterative updating approach is derived on the basis of constructing a fixed-point function. It should be pointed out that the iteration variable is included in the filter gains. With the help of the iterative updated filter gains, the state estimate and filtering information matrix are also iteratively updated. The final illustrative simulation verifies the effectiveness of MCIF.

An improved genetic algorithm for mobile robot path planning in grid environment

ABSTRACT. This paper proposes an improved genetic algorithm to study the path planning of mobile robot in grid environment. First, rasterize the motion plane of the robot, use serial number coding method and design a heuristic median insertion method to establish the initial population, ensure that the planned initial paths are all feasible paths, thereby speeding up the convergence of the algorithm. Then assign different weights to the path length, path security, and path energy consumption and combine them to generate a multi-objective fitness function. Finally, improve some genetic operations to maintain the population diversity of the algorithm in the later period, and avoid the algorithm from falling into avoid premature. Simulation experiments show that the proposed algorithm can quickly plan a feasible path in the grid environment. The path is not only shorter in length, but also more stable. At the same time, the running speed of the algorithm is 45.7% higher than other improved algorithms.

Research on the algorithm of the spread prediction of forest diseases and pests

ABSTRACT. Due to severe economic losses caused by forest diseases and pests in China, prediction for the spread of forest diseases and pests has become one of the most challenging and hot issues. The most previous solutions have at least the following three disadvantages: (1) lacking effective utilization of image data; (2) only supporting one-dimensional prediction value, which provides limited information; (3) limiting to a small scale (e.g., sample-plot), rather than a large scale like a forest zone. Therefore, we propose an algorithm for the spread prediction based on linear regression applied to a large regional spread of forest diseases and pests. Compared to the most conventional numerical prediction, our prediction method works on two dimensions. Specifically, the diseases and pests areas are fitted by cubic B-spline curves and a defined energy function was provide to describe the difference between the contours of the future time and the current time. Then, linear regression is applied to predict the spread parameters (the distance and angle) with linear regression, adhering to prior forestry research. After two corrections, the final predicted contour is obtained. Finally, we devise an appropriate 3D interactive visualization. Experimental results indicate that the proposed algorithm can effectively predict the spread of forest diseases and pests, providing forestry workers with visual aids of the future situation.

Clustering-Based Reinforcement Learning for Optimal Handover Triggering Mechanism in 5G Ultra-Dense Networks

ABSTRACT. The Ultra-dense networks (UDNs) is considered as one of the key technologies in 5G, which can provide a high transmission rate and efficient radio resource management to mobile users. However, the UDN leads to the dense deployment of small base stations (BSs), which could introduce higher interference and subsequently increase the complexity of handover management. At present, the conventional handover triggering mechanism (also known as the A3 event) is only designed for macro mobility, thus creating unwanted side effect when using UDNs such as frequent handover, ping-pong handover, handover failure, etc. Subsequently, the effect will degrade the overall network performance. In addition, the massive number of BSs significantly increase the system workload for network maintenance. To address these issues, this paper proposed an intelligent handover triggering mechanism based on Q-learning framework and subtractive clustering techniques. The input metrics are first converted to state vector by subtractive clustering, which could improve the efficiency and effectiveness of the training process. Afterwards, the Q-learning framework will learn the optimal handover triggering policy from the environment. The trained Q-table will be used as a reference for the decision to trigger a handover. The simulation results demonstrate that the proposed method can ensure the stronger mobility robustness of user equipment (UE), which has 60%-90% improved to the conventional approach with respect to the number of handovers, ping-ping handover rate and handover failure rate while maintaining other KPIs, i.e. throughput and network latency in a relatively high level. Moreover, the adoption of subtractive clustering has another 20% improved to the approach only based on Q-learning in terms of all evaluated KPIs.

Higher Accuracy and Lower Complexity: Convolutional Neural Network for Multi-Organ Segmentation

ABSTRACT. In computed tomography (CT), segmentation of organs-at-risk (OARs) is a key task in formulating the radiation therapy (RT) plan. However, it takes a lot of time to delineate OARs slice by slice in CT scans. The proposal of deep convolutional neural networks makes it possible to effectively segment medical images automatically. In this work, we propose an improved 2D U-Net to segment multiple OARs, aiming to increase accuracy while reducing complexity. Our method replaces vanilla convolutions with Octave Convolution (OctConv) units to reduce memory use and computation cost without accuracy sacrifice. We further plug a ‘Selective Kernel’ (SK) block after the encoder to capture multi-scale information and adaptively recalibrate the learned feature maps with attention mechanism. An in-house dataset is used to evaluate our method, where four chest organs are involved: left lung, right lung, heart, and spinal cord. Compared with the naive U-Net, the proposed method can improve Dice by up to nearly 3% and has fewer float-point operations (FLOPs).

Object Localization Based on Natural Language Descriptions for Fine-Grained Image

ABSTRACT. As a tool to express the unified semantics of object, language can be used to describe and locate objects within the scope of human vision. Searching for the position of an object in the field of vision through natural language is an important capability of the human perception level. Proposing a mechanism to learn this ability of human is a major challenge for computer vision. Most existing object localization methods usually use the strong supervised information of the training set to train the model. However, these models lack interpretability and require expensive tag information which are difficult to obtain. Facing these challenges, we propose a new method for locating object by natural language descriptions for fine-grained image. Firstly, we propose a method that can learn the semantically relevant parts between fine-grained images and languages, and achieve ideal localization accuracy without using strong supervisory signal. Moreover, we improve the Rank-base Loss function to make natural language description better match the target area in fine-grained images. The multi-scale fusion techniques are utilized to improve the ability to detect fine-grained images for capturing details of images better. As far as we know, this is the first attempt to use language for fine-grained images localization. Comprehensive experiments demonstrate that the proposed method achieves ideal localization results on the CUB200-2011 dataset. And the proposed model has strong zero-shot learning ability on untrained data.

An auto visual servoing manipulator for seafood harvesting

ABSTRACT. Seafood growing in the bottom of sea is difficult to harvest. In this work, an auto visual servoing manipulator is proposed to automatically collect the seafood scattered on the seafloor. The realization of the self-grasping method is due to the close feedback control of the visual information acquired by the vision system. YOLO framework is used as the detection unit to recognize the object from the eye-in-hand module. Then, the center coordinate of the object will be transformed to a 3D position by camera calibration matrix. Movement of each joint of the manipulator are then calculated. The proposed manipulator features low-weight and low-cost and can be mounted at any underwater unmanned vehicles, helping to harvest quickly.

A MoCap Tool for Real-time Entertainment Interaction

ABSTRACT. In this paper, we present a virtual-real game framework for human body interaction using passive optical motion capture device. In order to reduce the noise produced from motion capture, a noise reducing approach is firstly designed based on Kalman filter algorithm, which can estimate the position of marker and improve the accuracy of data. For the interaction of game, then a bounding box is created according to geometry of object for detecting collisions between virtual objects and real objects, preventing penetration during interaction. Finally, the game system is designed and implemented based on the capture device. The experimental results show that our system can preferably implement virtual-real interaction with diverse scenes such as human and rigid body, between virtual objects, human and real body objects

A Chinese-Japanese Vowel Priority Lip Matching Scheme Based on Humanoid Robot Ren-Xin

ABSTRACT. At present, the significance of humanoid robots dramatically increased while this kind of robots do not rarely enter human life because of its immature development. Lip shape of humanoid robots is crucial in the speech process since it makes humanoid robots look like real humans. It has been investigated that vowels are the basic elements of pronunciation in all languages in the world. Based on the traditional research of viseme, we increased the priority of smooth transition of lips between vowels and propose a lip matching scheme based on vowel priority. Additionally, we also designed a similarity evaluation model based on the Manhattan distance by using computer vision lips features, which quantifies the lip shape similarity between 0-1 provides an effective recommendation of evaluation standard. Surprisingly, this model successfully compensates the disadvantages of lip shape similarity evaluation criteria in this field. We applied this lip-matching scheme to Ren-Xin humanoid robot and performed robot teaching experiments as well as a similarity comparison experiment of 20 sentences with 2 males and 2 females and the robot. Notably, all the experiments have achieved good results.

Region- and pixel-level multi-focus image fusion through convolutional neural networks

ABSTRACT. Capturing all-in-focus images with three-dimensional scenes is typically a challenging task due to depth of field limitations, and various multi-focus image fusion methods have been employed to generate all-in-focus images. However, there are no method can achieve better performance with less time. Therefore, we combine region and pixel to reduce fusion time. First, a classifier generates focus, defocus and boundary regions with ResNet18. In order to improve the recognition ability of pixels near the boundary, a novel pixel level classifier is proposed. Based on COCO-2017 dataset, a new dataset is created that can reduce computations and consider the relationship between source images, so that the proposed method can accurately classify the regions and pixels without artifacts. Experimental results show that the proposed method outperforms other state-of-the-art methods in visual perception and object metrics. Additionally, the proposed method can reduce the computations by more than 80% compared to the other CNN-based methods.

Speech Emotion Recognition Based on Data Enhancement in Time-Frequency Domain

ABSTRACT. Abstract—In view of the lack of voice data samples in the current voice emotion recognition direction, resulting in poor recognition effect and prone to data over-fitting. Speech emotion recognition based on data enhancement method is proposed. The Berlin Emotional Corpus (EMO-DB) is enhanced and expanded from two directions: Time Domain and Frequency Domain. Research and analyze the recognition rate of two classifiers: k-Nearest Neighbor (KNN) and Support Vector Machine (SVM). Based on the expanded sample set, we extracted features, and then trained and tested. Experiments show that the effect after data enhancement is better.

Constructing Semantic Feature Map for Sentence Pair Modeling

ABSTRACT. Sentence semantic matching (SSM) is a fundamental task in natural language processing, which is valuable but very challenging due to the difficulties on modeling complex sentence representation and interactions. Recent work on deep neural model has shown its potential in improving the performance of SSM task. However, existing work usually employs recurrent or 1D convolutional neural network (CNN) to learn sentence representation, leading to limited performance improvement. We argue that the 2D convolutional neural network is more powerful to learn and capture sophisticated interactions between sentences. In this work, we proposes a novel sentence semantic matching model with Hierarchical CNN on Dimension-augmented Representation (HiDR for short). HiDR first generates dimensionaugmented representation with bidirectional LSTM for input sentences, then implements hierarchical 2D CNN to learn and capture their interactions to predict matching degree. To assess the effectiveness of our method, we conduct extensive experiments on two real-world public data sets. The comprehensive empirical results show that HiDR significantly outperforms state-of-the-art methods.

Research on Path Planning of Mobile Robot Based on Improved A* Algorithm

ABSTRACT. Path planning plays an essential role in mobile robot system. Aiming at the low efficiency of A* algorithm in the process of robot path finding, this paper proposes two improvement methods to solve the problem of inefficient searching of A* algorithm. The first method is to improve the distance calculation formula in A* algorithm, and the second method is to perform the evaluation function in A* algorithm. Weighted processing. Experimental data shows that the comprehensively improved A* algorithm dramatically reduces the number of useless access nodes and speeds up the time-consuming search process, which reduces the search time by about 12~14% compared to the standard A* algorithm.

Vessel detection based on dual-operator log-pol top-hat filter

ABSTRACT. Cyber-physical system(CPS) has been widely used in maritime surveillance to automatically detect possible threats using infrared images over a huge oceanic area. However, the environmental factors in the image and uncertainty of the ship’s direction will heavily influence the detecting accuracy, which many existing algorithms did not take into consideration and eventually achieved a bad performance. To solve above problems and provide robustness against environmental conditions, we combine the property of polar-logarithmic coordinate and proposed our method named “Dual-operator log-pol top-hat filter (DOLPTH)” to better take advantage of the difference information between the vessel and the background. Different comprehensive situations have been designed to test the performance of our algorithm and we also compare our algorithm with others. The experimental result shows that in both cases, DOLPTH can maintain high accuracy and have a good detecting performance.

Reinforcement learning based joint self-optimisation scheme for fuzzy logic handover algorithm in 5G HetNets

ABSTRACT. The heterogeneous networks (HetNets) in 5G can provide higher network coverage and system capacity to the user by deploying massive small base stations (BSs) within the 4G macro system. However, the large-scale deployment of small BSs significantly increases the complexity and workload of network maintenance and optimisation. On the other hand, the current handover triggering mechanism - A3 event was only designed for mobility management in the macro system. To implement A3 even directly in 5G-HetNets may cause degradation on the mobility robustness of user. Motivated by the concept of self-organisation networks (SON), this paper develops a self-optimisation triggering mechanism to enable automated network maintenance and enhance mobility robustness of user in 5G-HetNets. The proposed method integrates both advantages of subtractive clustering and Q-learning framework into the conventional fuzzy logic-based handover algorithm (FLHA). The subtractive clustering is first adopted to generate membership function (MF) for FLHA, which enable FLHA with the self-configuration feature. Subsequently, the Q-learning is utilised to learn the optimal handover policy from the environment as fuzzy rules that empower FLHA with self-optimisation function. The FLHA with SON functionality also overcomes the limitation of conventional FLHA that it must rely heavily on professional experience to design. The simulation results show that the proposed self-optimisation FLHA can effectively generate MF and fuzzy rules for FLHA. By comparing with conventional triggering mechanism, the proposed approach can decease approximately 91%, 49% and 97.5% in handover ratio, ping-pong handover ratio and handover failure ratio while improving 8% and 35% in network throughput and latency respectively.

The Reseach of Mine Behavior-based Safety based on Safety Soft Power

ABSTRACT. There are a lot of fractures in the fault zone. It is difficult to simulate the three-dimensional fracture zone.In this paper, a three-dimensional simulation method of fracture zone based on fractal theory is presented.With the help of the self-similarity of cracks, the fractal dimension is obtained by processing the pictures obtained from the similarity simulation material experiment.Based on this, the maximum fracture length and opening of related fault zones can be predicted.According to the empirical formula, the approximate height range of the fracture zone is calculated, and a fracture-based model based on wire-frame model is designed to simulate the fracture within the fracture zone with fracture-related parameters.A number of experiments verify that the maximum crack parameters obtained by the prediction algorithm of crack parameters are less error than those obtained by similar simulation material tests.The three-dimensional model of fracture zone generated on this basis has a more realistic simulation effect, which provides a feasible method for better research and three-dimensional simulation of fracture distribution in overburden fracture zone.

Delegate Proof of Job-Relevance: Consensus for A Smart Industry

ABSTRACT. The consensus mechanism, as the basic rules on Blockchain, along with the cryptography and data storage structures, are called three key technologies of Blockchain. In recent years, the application areas of blockchain technology have been further expanded due to the development of smart contracts. The personalized design and improvement on consensus mechanisms, which make them more applicable to actual industries, have become a research focus in the field of blockchain application. The theoretical background of this paper is a Smart Industry based on Alliance Blockchain, in which smart contracts are signed among network nodes to automate transaction processes. In this paper, a new consensus mechanism named Delegate Proof of Job-Relevance(DPoJ) is proposed, the activity and professionalism of nodes are emphasized, the application areas of smart contracts are innovatively classified, the Job- Relevance of nodes is calculated as a parameter in equity calculation in the process of consensus. Next, the influence of Job-Relevance on the growth rate of nodes’ equities under DPoJ consensus is analyzed from the perspective of game theory and mechanism design. At last, it is proved that in most cases, rational nodes are more inclined to increase their equities in the process of consensus by improving their Job-Relevance.

A research on Quick-SIFT and eliminate ghosting technique of UAV image mosaic

ABSTRACT. It is an important technique to improve practical value of UAV by using image mosaic to fill the gap of insufficient coverage of UAV aerial images. There are two common issues in the research of UAV image mosaic: running speed and mosaic accuracy. To this end, this paper first analysis the UAV patrol characteristics and the causes of ghosting. the Quick-SIFT operator is constructed by reducing running time by reducing Octaves and Levels of Gauss pyramid and selecting the third level image in each Octave for feature point extraction. Then AANAP is used for image registration and projection transformation. The difference area between sequence images is calculated by the frame difference method. On this basis, the region growing algorithm is used to segment the moving object. Finally, the difference region is single sampled, the other regions are linear weighted fusion. The experimental results show that the matching time is only 1/10 of SIFT, which significantly improve the speed of image mosaic. The method also effectively eliminate the motion ghosting and gets more stable recommendation results of better quality.

Tangent Loss Function for Classification with Deep Neural Networks

ABSTRACT. It is known that the traditional cross-entropy (CE) loss function is sensitive to randomness introduced by training samples. We propose a novel loss function, namely tangent loss (TG), aiming to making classification models more stable while achieving comparable performance. The TG loss function trains the neural network in a way that emphasizes samples whose predictions deviate greatly from the targets at each training step. Compared with CE loss function, TG gives a light penalty to samples whose predictions are close with the real targets, and a great exponential penalty to samples whose predictions are far away from the targets. We apply TG loss function on four classification tasks, including sentence semantic matching, sentiment analysis, text classification and image classification. Extensive experimental results on the real-world datasets show that the TG loss function improves the stability of classification models, which can obtain better or comparable classification performance than CE loss function.

Chinese Sentence Semantic Matching Based on Multi-Granularity Fusion Model

ABSTRACT. Sentence semantic matching is the cornerstone of many natural language processing tasks, including Chinese language processing. It is well known that Chinese sentences with dierent polysemous words or word order may have totally dierent semantic meanings. Thus, to represent and match the sentence semantic meaning accurately, one challenge that must be solved is how to capture the semantic features from multi-granularity perspective, e.g., characters and words. To address such challenge, we propose a novel sentence semantic matching model which is based on the fusion of semantic features from character-granularity and word-granularity respectively. Particularly, the multi-granularity fusion intends to extract more semantic features to better optimize the downstream sentence semantic matching. In addition, we propose the equilibrium cross-entropy, a novel loss function, by setting mean square error( MSE) as an equilibrium factor of cross-entropy. It can automatically balance the category imbalance between positive and negative samples as well as improve the accuracy of classication when the classication boundary is fuzzy. The experimental results conducted on Chinese open data sets demonstrate that our proposed model combined with binary equilibrium cross-entropy loss function is superior to the existing stateof- the-art sentence semantic matching models.

A Fusion Model for Multi-label Emotion Classification Based on BERT and Topic Clustering

ABSTRACT. As one of the most critical tasks of natural language processing (NLP), emotion classification has a wide range of applications in many fields. However, restricted by corpus, semantic ambiguity, and other constraints, researchers in emotion classification face many difficulties, and the accuracy of multi-label emotion classification is not ideal. In this paper, to improve the accuracy of multi-label emotion classification, especially when semantic ambiguity occurs, we proposed a fusion model for text based on self-attention and topic clustering. We use the Pre-trained BERT to extract the hidden emotional representations of the sentence, and use the improved LDA topic model to cluster the topics of different levels of text. Then we fuse the hidden representations of the sentence and use a classification neural network to calculate the multi-label emotional intensity of the sentence. After testing on the Chinese emotion corpus Ren\_CECPs corpus, extensive experimental results demonstrate that our model outperforms several strong baselines and related works. The F1-score of our model reaches 0.484, which is 0.064 higher than the best results in similar studies.

Representation-guided Generative Adversarial Network for Unpaired Photo-to-caricature Translation

ABSTRACT. Imitating the painting style from a caricature look is an interesting work which has been attracting a lot of attention. Creating a caricature look from a photo with an imitated painting style is more challenging. Recently, image-to-image translation approaches are popular. However, it is still difficult to design an automatic photo-to-caricature painting framework based on image-to-image translation methods due to three reasons: 1) annotating aligned photo-caricature pairs is costly and therefore applying supervised learning is not always approachable; 2) more than a style or color translation task, photo-to-caricature tasks require to capture and exaggerate high-level features from the target person. 3) many existed caricatures may include multiple painting styles, which further increases the difficulty of an image-to-image task. To address these challenges, we develop a novel photo-to-caricature translation method that can achieve promising performance by only using unpaired images. Firstly, we design a representation-guided adversarial framework to vividly transfer selected caricature style with unsupervised learning. Secondly, we introduce a feature-pyramid adversarial network to distill the high-level features and improve the ability to synthesize the plausible caricatures. Finally, we build an extra information flow between the generation and discrimination stages and present a representation-consistency loss to separate the content and the style more efficiently and help our framework provide an arbitrary painting style translation. We evaluate our method on various caricature datasets to show our model has superior imitation ability to comprehend, translate and paint a caricature.

Hyperspectral Image Classification Based on LBP Feature Extraction and Multi-model Ensemble Learning

ABSTRACT. In this paper, aiming at the problem that traditional hyperspectral classification pays more attention to spectral features than spatial features, and information redundancy between spectra. We propose a hyperspectral classification method based on Local Binary Patterns feature extraction and multi-model ensemble learning. It uses Local Binary Patterns (LBP) to extract the spatial features of a hyperspectral image and then superimposes the spatial features and spectral features into high-dimensional vectors. The obtained high-dimensional vector is used for joint sparse representation, and pixel residuals are calculated. At the same time, considering the information redundancy of a hyperspectral image, we use the correlation coefficient (CC) to reduce the inter-class interference of samples. Finally, these two classification methods are ensemble learning to classify the hyperspectral image. The experimental results of hyperspectral data show that compared with traditional classification methods, the proposed method can effectively improve the classification accuracy.

Chinese Micro-blog Sentiment Classification Based on Word2vec and Vector Arithmetic

ABSTRACT. Word vector technology based on neural network language model can extract the deep semantic relations and syntactic features between words from massive corpus, which has been applied in many natural language processing researches and achieved good results. The paper extracts expression vectors of emotion words and emoticons from a large-scale micro-blog corpus based on word2vec, expands emotion dictionary based on vector semantic similarity, and then analyzes the structure of "negative word + emotion word" and gets its vector representation by vector addition, finally computes the emotion category vector based on improved K-Means algorithm and designs a sentiment classifier based on the similarity calculation between emotion semantic unit vectors in a micro-blog and the emotion category vector. The experimental results for a micro-blog test corpus (NLPCC2014) show that the method achieves a better classification performance in polarity classification and emotion classification than the benchmark methods.

A Novel Ray-Casting Algorithm Using Dynamic Adaptive Sampling

ABSTRACT. Ray-Casting algorithm is an important volume rendering algorithm, which is widely used in medical image processing. Aiming at the shortcomings of the current Ray-Casting algorithms in the 3D reconstruction of medical images, such as slow rendering speed and low sampling efficiency, an improved algorithm based on dynamic adaptive sampling is proposed. By using the central difference gradient method, the corresponding sampling interval is obtained dynamically according to the different sampling points. Meanwhile, a new rendering operator is proposed based on the color value and opacity changes before and after the ray enters the volume element, and the resistance luminosity. Compared with some existing algorithms, experimental results show that the method proposed in this paper has a faster rendering speed while ensuring the quality of image generation.

Uneven Illumination Surface Inspection Using Independent Component Analysis

ABSTRACT. In order to solve the problem that the low-contrast LCD surface is difficult to detect due to uneven illumination, an independent component analysis (ICA) model is proposed to defect detection, the ICA model uses the training image of defect-free as the input and get the inverse mixing matrix that design the filter. In the inspection process, the convolution between the filter and the test image is calculated, the thresholds is set up to make impulse responses of all pixels in the defect-free area within the control range and all the pixels in the defect area outside the control range. Finally, the experimental results have verified our proposed method ’s accuracy in defects detection of low-contrast surface under uneven illumination.

Chinese Poetry and Couplet Automatic Generation Based on Self-attention and Multi-task Neural Network Model

ABSTRACT. Poetry and couplets, as a valuable part of human cultural heritage, carry traditional Chinese culture. Couplet auto-generation and poetry writing are a challenge for NLP. This paper proposes a new multi-task neural network model for the automatic generation of ancient poetry couplets. The model uses seq2seq encoding and decoding structure, which combines attention mechanism and self-attention mechanism. In the encoding part, we use parameter hard work sharing. The encoding part uses two BiLSTM networks to learn the similar characteristics of ancient poems and couplets, one for encoding keywords and the other for encoding generated poems or couplet sentences. The decoding parameters are not shared. It consists of two LSTM networks which decode the output of ancient poems and couplets, respectively, in order to preserve the different semantic and grammatical features of ancient poems and couplets. Poetry and couplets have many similar characteristics, and multi-task learning can learn more features through related tasks, making the model more generalized. Therefore, we use multi-task model to generate ancient poems and couplets is significantly better than single-task model. Also,our model introduces a self-attention mechanism to learn the dependency and internal structure of words in sentences. Finally, the effectiveness of the method is verified by automatic and manual evaluations.

Single Image Dehazing Based on Fusion of Dark Channel and Color Attenuation Prior Transmittance

ABSTRACT. Single image haze removal has been a challenging problem due to its ill-posed nature. In this paper, we propose a simple but effective image prior – the transmittance fusion of the dark channel prior and color attenuation prior. By constructing the weight factor of transmittance, the respective advantages of both priors can be well presented. Brightness information makes the algorithm more adaptive in the dehazing process. Observed through a large amount of experimental data, the method in this paper can enhance the dehazing image effectively.

Detection and localization of scorebox in long duration broadcast sports videos

ABSTRACT. Many studies have been devoted to sports video summarization and content-based video search. However, the semantic importance of caption box or scorebox (SB) appearing in broadcast sports videos has been almost neglected as SB holds key elements for conducting these research tasks. SB localization is challenging as there exists a huge variety of SBs, and almost every broadcast sports video contains a different SB with unique features such as geometry, font, colors, location, and texture. Every time a new sports series emerges, it contains a new type of scorebox that never resembles any other sports series. One can say that, SBs are evolving with unexpected features and novel challenges. Thus, traditional learning-based methods alone are not suitable for detection. This paper proposes a robust method for detecting and localizing SBs appearing in broadcast sports videos. It automatically learns the template of SB and further utilizes the template, as the SB may translate from the usual location and may disappear for a short time. We performed comprehensive experiments on a real-life dataset SP-1 and comparison with state-of-the-art methods shows that the proposed method achieves better performance.

ECDet: An Efficient ConvNet for Real-time Object Detection

ABSTRACT. This paper introduces a lightweight convolutional neural network, called ECDet, for real-time accurate object detection. In contrast to recent advances of lightweight networks that prefer to use pointwise convolution for changing the number of feature map’s channel, ECDet makes an effort to design equal channel block for constructing the whole backbone network architecture. Meanwhile,we deploy depthwise convolution to compress the feature pyramid network(FPN) detection head further. The experiments show that ECDet only has 3.19 M model size and needs only 3.48B FLOPs with a 416×416 input image. The comprehensive experiments demonstrate that our model achieves promising results in terms of available speed and accuracy trade-off on PASCAL VOC 2007 datasets.

Modulation Pattern Recognition of M-QAM Signals Based on Convolutional neural network And Extreme learning machine

ABSTRACT. In this paper, we propose a workflow and a deep learning algorithm for recognizing Quadrature amplitude modulation signal(QAM), this design adopts a convolutional neural network (CNN) and Extreme Learning Machine (ELM) as the core,leverage the powerful feature extraction of CNN and fast classification learning of ELM. The spectrogram image features of the signal obtained by short-time Fourier transform (STFT) are input to the CNN-ELM hybrid model, the modulation mode of the QAM signal is finally recognized by ELM. This algorithm surmounts the shortcomings of traditional methods well, Simulation results also verify the superiority of the proposed system whose classification accuracy is beyond 99.86%.

An End-to-End Reinforcement Learning Method for AGV Path Planning

ABSTRACT. As an automatic transport vehicle, AGV (Automated Guided Vehicle) play an important role in improving the efficiency of production logistics system. We propose a novel end-to-end AGV path planning method in order to make AGV obtain the optimal action from the original visual image and lidar information. In addition, in order to make the AGV adapt to different environments to work normally, we adopted a deep reinforcement learning method to train AGV, combining priority experience replay mechanism and Double DQN (Deep Q Network) with the dueling architecture, to make AGV has a certain generalization ability for unknown environment and adaptability to a dynamic environment. Experiments show the effectiveness of the proposed method.

Collaborative Filtering Recommendation Algorithm Based on Bisecting K-means Clustering

ABSTRACT. The traditional collaborative filtering recommendation algorithm has the problem of data sparsity and expansibility. Aiming at this problem, and improved collaborative filtering algorithm proposed. The algorithm first removes unrated items in the rating data matrix based on the Weighted Slope One algorithm preprocessing to reduce its sparsity. Then the preprocessed rating data is clustered based on the bisecting K-means algorithm, which reduces the nearest neighbor search space of the target user by assembling similar objects, thereby improving the algorithm’s expansibility. Finally, use the traditional recommendation algorithm to generate the final result.Experimental results show that the improved algorithm improves the recommendation effect.

Semantic segmentation of manipulator grasping scene with fusion of RGB and depth information

ABSTRACT. Semantic segmentation with image RGB information is significantly useful for intelligent perception of robotics. However, semantic segmentation with only RGB information does not perform well for objects with the same color during grasping manipulation. This paper proposes a new semantic segmentation scheme based on the fusion of RGB and heights transformed from depth information, which is not simple fusion of RGB-D method. It modifies the height information so that different objects of the same color can be distinguished in height. It outperforms the classical RGB segmentation scheme at improving speed and 7:42% higher at the final performance of semantic segmentation of manipulator grasping scene (contains objects with the same color). Because of the need of RGB-D information, this paper proposes a method of self-collecting and self-labeling data of manipulator grasping scene, which reduces the cost of manpower by making full use of the highly automated equipment and the characteristics of the scene.

Multi-model Transfer and Optimization for Cloze Task

ABSTRACT. Substantial progress has been made recently in training context-aware language models. CLOTH is a human-created cloze dataset, which can better evaluate machine reading comprehension. Although the author of CLOTH has done many experiments on BERT and context2wec, it is still worth studying the performance of other models. We applied the CLOTH dataset to the PTMs in the NLP field and evaluated their performance based on diffrent model mechanisms. The results showed that ALBERT performed well on the cloze task. In addition, we introduce adversarial training into the model to improve the generalization ability of the model. Experiments show that adversarial training has significant effects in improving the robustness and accuracy of the model.

Chinese Clinical Named Entity Recognition based on Stacked Neural Network

ABSTRACT. The precise named entity recognition (NER) is a key component in Chinese clinical natural language processing. Though clinical NER systems have attracted widespread attention and been studied for decades, the latest NER research usually relies on a shallow text representation with one-layer neural encoding, which fails to capture deep features and limits its performance improvement. To capture more features and encode the clinical text eciently, we propose a deep stacked neural network for Chinese clinical NER. The neural network stacks two bidirectional LSTM (long-short term memory) and GRU (gated recurrent unit) layers to encode the text twice, followed by a CRF layer to recognize the named entity in Chinese clinical text. Extensive empirical results on three real-world data sets demonstrate that our method signicantly outperforms four state-of-the-art NER methods.

Analysis of behavior trajectory based on deep learning in ammonia environment for fish

ABSTRACT. Ammonia can be produced by the respiration and excretion of fish during the farming process, which can affect the life of fish. In this paper, to research the behavior of fish under different ammonia concentration and make the corresponding judgment and early warning for the abnormal behavior of fish, the different ammonia environments are simulated by adding the ammonium chloride into the water. Different from the existing methods of directly artificial observation or artificial marking, this paper proposed a recognition and analysis of behavior trajectory approach based on deep learning. Firstly, the three-dimensional spatial trajectories of fish are drawn by three-dimensional reconstruction. Then the influence of different concentrations of ammonia on fish is analyzed according to the behavior trajectory of fish in different concentrations of ammonia. The results of comparative experiments show that the movement of fish and vitality decrease significantly and the fish often stagnates in the water of containing ammonium chloride. The proposed approach can provide a new idea for the behavior analysis of animal.

18:20-19:30 Session 4: Keynote Session (This program is scheduled in Japan Standard Time Zone)

・Keynote 9:  Prof. Hao Gao, Nanjing University of Posts and Telecommunications, China

   Evolutionary Computation based Image Segmentation Algorithm

・Keynote 10: Prof. Jože Guna, University of Ljubljana, Slovenia

   VR Challenges and Applications   PDF

19:30-20:00 Session 5: Poster Session (This program is scheduled in Japan Standard Time Zone)
Research on Character Recognition Technology Based on Neural Network

ABSTRACT. Convolutional neural networks are currently popular multi-layer neural networks. They differ from traditional neural networks in some ways. They are mainly reflected in the introduction of three new concepts: weight sharing, receptive fields, and pooling. In this paper, for the handwritten digit character data set, a deep neural network is used to construct a LeNet network for training and recognition, and the data is enhanced differently to study and compare the recognition accuracy of the final network structure.

Improving Performance of Dictionary Learning via Auxiliary Training Samples and Robust Dictionary

ABSTRACT. A key to dictionary learning is to attain a robust dictionary, which enables difference between test samples and training samples of the same class to be alleviated. Owing to this factor, the dictionary can bring proper representations of test samples and produce better classification results for them. For face recognition, because of varying facial appearance caused by changeable illuminations, poses and facial expressions, a robust dictionary is definitely preferred. In this paper, we propose a robust dictionary learning method for face recognition. Robustness is attained in a two-fold way. First, auxiliary faces are produced via original face images. Second, the scheme to attain the dictionary under the condition that label coefficients can deviate from sample coefficients is designed. Auxiliary faces express possible variations of faces. Moreover, it seems that difference between auxiliary faces and original training samples of the same class somewhat reflects difference between test samples and training samples, thus use of auxiliary faces is beneficial to improve robustness of the method. The scheme to attain the dictionary further enhances robustness.