previous day
next day
all days

View: session overviewtalk overview

13:00-18:00 Session 12: Ph.D. Forum
A Big Datasets for Vehicle Logo Recognition and Detection

ABSTRACT. Vehicle logo detection (VLD) is a special and significant topic in object detection for vehicle identification system applications. Nevertheless, the range of the research and analysis for VLD are seriously narrow in the real complex scenes, although it’s a critical role in the object detection of small sizes. In this paper, we make further analysis work toward vehicle logo recognition and detection in real-world situations. To begin with, we propose a new multi-class VLD dataset, called VLD-45 (Vehicle Logo Dataset), which contains 45000 images and 50359 objects from 45 categories respectively. Our new dataset provides several research challenges involve in small sizes object, shape deformation, low contrast and so on. Furthermore, we use 6 existing classifiers and 6 detectors to evaluate our dataset and show the baseline performance. According to the result, our dataset has very significant research value for the task of small-scale object detection.

3D Image Segmentation

ABSTRACT. Big data-driven deep learning methods have been widely used in image segmentation. However, the main challenge is that a large number of labeled data is required to train a well-performed deep learning model, which is impractical in industrial applications. Meta-learning, as one of the most promising research areas, is believed to be a key tool for approaching computer vision. To this end, this paper summarizes the state-of-the-art methods and current situation of image segmentation based on meta-learning and points out the future trends of meta-learning. First, we introduce the fundamental concepts of image segmentation and meta-learning and summary the related meta-learning methods. Then, a comprehensive benchmark study is carried out to compare the representative meta-learning methods in image segmentation. Finally, based on the results of the benchmark study, the future trends of meta-learning are discussed.

Global-PBNet for Point Cloud Registration

ABSTRACT. Registration is a transformation estimation problem between point clouds, which plays a unique and critical role in many computer vision applications. The development of deep learning-based methods improves the robustness and efficiency of registration, but it is easy to fall into local optimality in the improvement of final accuracy. Although the traditional method based on optimization has a better performance in the result precision, its performance depends on the quality of initialization. In this paper, we propose a PBNet that combines point cloud network with global optimization method. This framework uses the feature information of objects to perform high precision rough registration, and then searches the entire 3D motion space to implement branch-and-bound and iterative nearest point methods. The evaluation results show that PBNet can greatly reduce the influence of initial values on registration, and has good robustness against noise and outliers. Our approach represents a 16% improvement over the current baseline.

A Network for Image Segmentation with Language Referring Expression and Comprehension

ABSTRACT. Image segmentation with language referring expression can complete object segmentation based on expression text. Existing image segmentation methods can show good results on high-performance computers, but most applications in reality need real-time and high accuracy. At present, most methods cannot meet these requirements well. Therefore, we propose a high precision and real-time network that integrates the two tasks of image segmentation with language referring expression and referring expression comprehension, then treats them as two branches. Specifically, the network first merges the two tasks. The feature maps of different scales extracted by each branch are callback to the two branches to make prediction results respectively. These two tasks promote and restrict each other. Experiments show that our method has better real-time performance and higher accuracy than existing methods.

Multi-Feature Fusion Point Cloud Completion Network

ABSTRACT. In the real world, 3D point cloud data is generally obtained by LiDAR scanning. However, objects in the real world are occluded from each other, which will cause the point cloud scanned by LiDAR to be partially missing. In this paper, we improve PF-Net (a learning-based point cloud completion network), which is better to obtain the feature of the point cloud. Specifically, our improved network is an encoder-decoder-discriminator structure, which can directly take the missing point cloud data as input without additional preprocessing. In the encoder, we use the ALL-MLP method for feature extraction of the point cloud. It combines the features obtained by each convolution in the feature extraction process and sends them to the decoder. Our experiments show that the improved network has better accuracy in most categories than the state-of-the-art methods, and generates a relatively complete point cloud with achieving the purpose of complementing missing point cloud data.

Realtime Single-stage Instance Segmentation Network Based on Anchors

ABSTRACT. Abstract—In this work, we propose an instance segmentation method that uses a single-stage detector. It is simpler and easier to train than the traditional two-stage method. It does not rely on the traditional region proposal, but directly uses pixels for operation, which reduces the complexity of the network and significantly increases the speed. Our segmentation method is based on the anchor box, which performs multi-scale detection by setting anchors of different sizes on multi-scale feature maps. We add a new branch to the prediction head to generate prototype masks and mask coefficients, then linearly combine them to generate mask. In our experiments, the proposed model has better performance. We get 35.12fps on a single NVIDIA GEFORCE GTX 2080 GPU, which proves that our method is simple, effective, and faster.

Generative Robotic Grasping Using Depthwise Separable Convolution

ABSTRACT. In this paper, we present an efficient method using deep learning for grasp detection. In actual grasping tasks, it is often necessary to quickly identify and grasp various unseen objects. Our method is a real-time processing method for depth image discrete sampling to avoid grabbing candidate objects and long calculation time, the method with depth wise convolution and pointwise convolution to model the relations between the channels and directly parameterize as a grasp quality for every pixel. Our method calculates the rectangle grasping box to generate grasping pose for the input image. For the experimental evaluation on Jacquard Dataset, it shows that our method can effectively predict grasp points for novel class objects.