ISVC'20: 15TH INTERNATIONAL SYMPOSIUM ON VISUAL COMPUTING
PROGRAM FOR WEDNESDAY, OCTOBER 7TH
Days:
previous day
next day
all days

View: session overviewtalk overview

09:00-10:00 Session 15: Keynote - Ross Maciejewski
Location: K
09:00
Fun with Visualization in the Data Deluge

ABSTRACT. From smart phones to fitness trackers to sensor enabled buildings, data is currently being collected at an unprecedented rate. Now, more than ever, data exists that can be used to gain insight into questions that run the gamut from nonsensical to essential. One key technology for gaining insight into data is visualization. In this talk, we will explore how visualization can be leveraged to help us entertain fun and unique questions in the data deluge. We will investigate how social media can help us predict the next blockbuster film, how much information does your name carry, how Google Street View can open a world of questions for urban planners, and more. By thinking about fun questions for datasets, we will demonstrate how visual computing can help build cross-domain collaborations, paving the way to discover new insights and challenges.

10:10-11:10 Session 16A: Deep Learning II
Chair:
Location: A
10:10
A Deep Genetic Programming based Methodology for Art Media Classification Robust to Adversarial Perturbations
PRESENTER: Gustavo Olague

ABSTRACT. Art Media Classification problem is a current research area that has taken attention due to the complex extraction and analysis of features of high-value art pieces. The perception of the attributes can not be subjective as humans sometimes follow a biased interpretation of artworks while ensuring the trustworthiness of automated observation. Machine Learning has outperformed many areas through their learning process of artificial feature extraction from images instead of designing handcrafted feature detectors. However, a big concern linked to their reliability has brought attention because, with intentionally made small perturbations in the input image (adversarial attack), it can be changed their prediction completely. In this manner, we foresee two ways to approach the situation: (1) solve the problem of adversarial attacks in current neural networks methodologies, or (2) propose a different approach that can challenge deep learning without the effects of adversarial attacks. The first one hasn't been solved yet, and adversarial attacks have become even more complex to defend. Therefore, this work presents a Deep Genetic Programming method that competes with deep learning and studies the transferability of adversarial attacks using two artworks databases made by art experts. The results show that the Deep Genetic Programming method preserves its performance in comparison with AlexNet, making it robust to these perturbations and competing to the performance of Deep Learning.

10:30
rcGAN: Learning a generative model for arbitrary size image generation

ABSTRACT. We introduce rcGAN, a new generative method that is capable of synthesising arbitrary sized, high-resolution images derived from a single reference image used to train our model. Our two-steps method uses a randomly conditioned convolutional generative adversarial network (rcGAN) trained on patches obtained from a reference image. It can capture the reference image internal patches distribution and then produce high-quality samples that share with this image the same visual attributes. After training, the rcGAN generates recursively an arbitrary number of samples which are then stitched together to produce an image whose size is determined by the number of samples that are used to synthesise it. Our proposed method can provide a practically infinite number of variations of a single input image that offers enough variability while preserving the essential large scale constraints. We experiment with our two-steps method on many types of models, including textures, building facades and natural landscapes, comparing very positively against others methods.

10:50
Sketch-Inspector: a Deep Mixture Model for High-Quality Sketch Generation of Cats
PRESENTER: Yunkui Pang

ABSTRACT. With the involvement of artificial intelligence (AI), sketches can be automatically generated under certain topics. Even though breakthroughs have been made in previous studies in this area, a relatively high proportion of the generated figures are too abstract to recognize, which illustrates that AIs fail to learn the general pattern of the target object when drawing. This paper posits that supervising the process of stroke generation can lead to a more accurate sketch interpretation. Based on that, a sketch generating system with an assistant convolutional neural network (CNN) predictor to suggest the shape of the next stroke is presented in this paper. In addition, a CNN-based discriminator is introduced to judge the recognizability of the end product. Since the base-line model is ineffective at generating multi-class sketches, we restrict the model to produce one category. Because the image of a cat is easy to identify, we consider cat sketches selected from the QuickDraw data set. This paper compares the proposed model with the original Sketch-RNN on 75K human-drawn cat sketches. The result indicates that our model produces sketches with higher quality than human's sketches.

10:10-11:10 Session 16B: Virtual Reality
Location: B
10:10
Walking in a Crowd Full of Virtual Characters: Effects of Virtual Character Appearance on Human Movement Behavior
PRESENTER: Christos Mousas

ABSTRACT. This paper describes a study on the effects that a virtual crowd composed of virtual characters with different appearance has on human movement behavior in a virtual environment. The study examines five virtual crowd conditions that include the following virtual characters: neutral, realistic, cartoon, zombies, and fantasy. Participants were instructed to cross a virtual crosswalk and each time, one of the examined crowd conditions shown. The movement behavior of participants was captured and objectively analyzed based on four measurements (speed, deviation, distance traveled, and interpersonal distance). It was found that the appearance of the virtual characters significantly affected the movement behavior of participants. Participants walked slower when exposed to a virtual crowd composed of cartoon characters and faster when exposed to fantasy characters. Moreover, participants deviated more when exposed to a crowd composed of fantasy characters compared to cartoon and zombie characters. Finally, the interpersonal distance between participants and fantasy characters was significantly greater compared to human and zombie characters. Our findings, limitations and future directions are discussed in the paper.

10:30
Improving Chinese Reading Comprehensions of Dyslexic Children via VR Reading
PRESENTER: Chiu Yung Fu

ABSTRACT. Dyslexia caused neurological limitations upon its patients such that they have poor phonological awareness and orthographical skills. They in turn limit the patients’ abilities to derive meaning from words which are keys to effective reading. To aid dyslexics in their comprehensions, a top-down approach to reading is proposed. In the meanwhile, a graphical model is also proposed as a tool to help researchers pinpoint neurological processes. It cleanly shows that the top-down approach could bypass dyslexic patients’ neurological lim-itations. It is also hypothesized that by aiding their understanding of articles and words, it is also possible for patients to improve their phonological awareness and orthographical skills. Our implementation to the research goals is VR reading, which uses multimedia feedback to give cues to dyslex-ic students on the meaning of words and articles. VR reading consists of aid-ing images, voice-overs, videos and a background theme dome that gives en-capsulated cues on the meanings of the article and its words that are de-tached from the article itself. This is an important design decision as we want dyslexic students to rely more on multimedia feedback in deriving the mean-ing. We also show a preliminary evaluation which is a step towards testifying the aforementioned hypotheses with VR reading. It involves primary school children to read a Chinese article and be evaluated afterwards. The result seems to indicate that VR reading is useful in aiding students in their reading comprehension and additionally, has potential to improve their phonological awareness and orthographical skills.

10:50
Improving User Experience in Augmented Reality Mirrors with 3D Displays
PRESENTER: Gun Lee

ABSTRACT. Optical-reflection type Augmented Reality (AR) mirror displays use half-silvered mirrors attached in front of a digital display to show virtual objects overlaid into the physical world reflected in the mirror. Prior works mostly displayed 2D images on the surface of the mirror hence suffered from visual depth mismatch between the optical reflection of the 3D physical space and the virtual image. In this paper, we propose to use 3D visualisation to overcome this problem and improve the user experience by providing better depth perception for watching and interacting with the content displayed on an AR mirror. As a proof-of-concept, we developed two prototype optical-reflection type 3D AR mirror displays, one using a multi-view autostereoscopic 3D display and another using a head tracked stereoscopic 3D display that supports hand gesture interaction. A preliminary user study showed that the participants were able to perform selection tasks faster and with less error under 3D visualisation, and felt the 3D visualisation required less mental effort, was more comfortable to watch and interact with compared to the traditional 2D visualisation. The results also indicated the participants felt the virtual image was closer to their body, supporting the visual perception model of 3D AR mirror we postulated.

10:10-11:10 Session 16C: Special Track: Computer Vision Advances in Geo-Spatial Applications and Remote Sensing
Chair:
Location: C
10:10
Natural Disaster Building Damage Assessment Using a Two-Encoder U-Net
PRESENTER: Billy Ermlick

ABSTRACT. When a natural disaster occurs, damaged regions rely on timely damage assessments to receive relief. Currently, this is a slow and laborious process, during which emergency response groups conduct on- the-ground evaluations to form scal estimates. This project attempts to expedite relief eorts by applying novel computer vision algorithms to satellite images to quickly and accurately estimate physical and scal damage caused by natural disasters. We have investigated and modify U-Net architectures for jointly localize buildings, classify damage and establish change detection. In particular, we added a second encoder to the U-Net to simultaneously process pre- and post-event imagery, with both encoders sharing weights. In addition, the decoder is trained to both locate buildings and classify damage estimates by formulating it as a single semantic segmentation problem. This enables us to produce damage estimates in one pass without needing to re-visit pixels (i.e. detection + classication tasks). Finally, we have added a downstream task that provides a pixel-based nancial model capable of outputting nancial costs according to the United States National Grid (USNG) coordinate system.

10:30
Understanding Flooding Detection Using Overhead Imagery - Lessons Learned
PRESENTER: Abdullah Said

ABSTRACT. Floods are one of the most devastating and costly natural disasters, posing a significant threat to human life and property, and necessitating systematic and timely response to flood risks. While most floods cannot be prevented, they can be detected, and a quick response can greatly reduce the consequences. Recent advancements in artificial intelligence, computing power, and earth observation data availability has enabled researchers to use computer vision and satellite/aerial imagery to help assess ground conditions and decision-makers’ prioritization of response efforts. This paper investigates different algorithmic design decisions to determine best flood line detection performance. We also investigated the value of adding non-imagery proxy data used for flood prediction into a computer vision pipeline, which includes the combination of Height Above Nearest Drainage (HAND)-based inundation map data and aerial imagery to train a semantic segmentation convolutional neural network. In our experiments, we trained several U-Net shaped fully convolutional neural networks using aerial imagery of hurricane Harvey retrieved from the National Oceanic and Atmospheric Administration (NOAA) repositories, and rasterized HAND map data retrieved from The Texas Advanced Computing Center (TACC). The paper contributes by showcasing the results of combining both a hydrologic and computer vision method for flood detection.

10:50
Hyperspectral Image Classification via Pyramid Graph Reasoning
PRESENTER: Tinghuai Wang

ABSTRACT. Convolutional neural networks (CNN) have made significant advances in hyperspectral image (HSI) classification. However, standard convolutional kernel neglects the intrinsic connections between data points, resulting in poor region delineation and small spurious predictions. Furthermore, HSIs have a unique continuous distribution along the high dimensional spectrum domain - much remains to be addressed in characterizing the spectral contexts considering the prohibitively high dimensionality and improving reasoning capability in light of the limited amount of labelled data. This paper presents a novel architecture which explicitly addresses these two issues. Specifically, we design an architecture to encode the multiple spectral contextual information in the form of spectral pyramid of multiple embedding spaces. In each spectral embedding space, we propose graph attention mechanism to explicitly perform interpretable reasoning in the spatial domain.

11:10-11:30Coffee Break
11:30-12:10 Session 17A: Deep Learning II
Chair:
Location: A
11:30
Depthwise Separable Convolutions and Variational Dropout within the context of YOLOv3
PRESENTER: Joseph Chakar

ABSTRACT. Deep learning algorithms have demonstrated remarkable performance in many sectors and have become one of the main foundations of modern computer-vision solutions. However, these algorithms often impose prohibitive levels of memory and computational overhead, especially in resource-constrained environments. In this study, we combine the state-of-the-art object-detection model YOLOv3 with depthwise separable convolutions and variational dropout in an attempt to bridge the gap between the superior accuracy of convolutional neural networks and the limited access to computational resources. We propose three lightweight variants of YOLOv3 by replacing the original network’s standard convolutions with depthwise separable convolutions at different strategic locations within the network, and we evaluate their impacts on YOLOv3’s size, speed, and accuracy. We also explore variational dropout: a technique that finds individual and unbounded dropout rates for each neural network weight. Experiments on the PASCAL VOC benchmark dataset show promising results where variational dropout combined with the most efficient YOLOv3 variant lead to an extremely sparse solution that reduces 95% of the baseline network’s parameters at a relatively small drop of 3% in accuracy.

11:50
Uncertainty Estimates in Deep Generative Models using Gaussian Processes
PRESENTER: Kai Katsumata

ABSTRACT. We propose a new framework to estimate the uncertainty of deep generative models. In real-world applications, uncertainty allows us to evaluate the reliability of the outcome of machine learning systems. Gaussian processes are widely known as a method in machine learning which provides estimates of uncertainty. Moreover, Gaussian processes have been shown to be equivalent to deep neural networks with infinitely wide layers. This equivalence suggests that Gaussian process regression can be used to perform Bayesian prediction with deep neural networks. However, existing Bayesian treatments of neural networks via Gaussian processes have only been applied so far to supervised learning; we are not aware of any work using neural networks and Gaussian processes for unsupervised learning. We extend the Bayesian Gaussian process latent variable model, an unsupervised learning method using Gaussian processes, and propose a Bayesian deep generative model by approximating the expectations of complex kernels. With a series of experiments, we validate that our method provides estimates of uncertainty from the relevance between variance and the output quality.

11:30-12:10 Session 17B: Virtual Reality
Location: B
11:30
Passenger Anxiety about Virtual Driver Awareness During a Trip with a Virtual Autonomous Vehicle
PRESENTER: Christos Mousas

ABSTRACT. A virtual reality study concerning participants' anxiety levels when immersed in a virtual reality interaction with an autonomous vehicle was conducted. Five conditions were tested. The examined conditions are based on awareness of the virtual character (driver). During the external awareness conditions the virtual character either focuses on the road traffic or does not. During the internal awareness conditions the virtual character either pays attention to the car or not. For the fifth condition, the virtual character is completely unaware; since a head-mounted display (HMD) was placed on his face. Results, implications, and limitations are discussed.

11:50
Investigating the Display Fidelity of Popular Head-Mounted Display Systems on Spatial Updating and Learning in VR
PRESENTER: Sabarish Babu

ABSTRACT. Often users in VR are required to make mental models, develop spatial awareness, and gain survey knowledge of the environment that they are exploring while learning the content of the simulation. This holds true for both educational and entertainment simulations where users explore large environments. In a between-subjects empirical evaluation, we examined the effect of the display fidelity of popular commercial head-mounted display systems based on display properties such as screen resolution, field of view, and screen size in three conditions in a display fidelity continuum, namely low fidelity, medium fidelity, and high fidelity. Our dependent variable was spatial updating, assessing survey knowledge by measuring the perceived orientation to landmarks previously visited when unseen, and content learning measured via a pre and post cognitive questionnaire created by domain experts based on Blooms taxonomy of learning. In a VR simulation for geology education, participants explored a terrain, modeled after a segment of the Grand Canyon, collecting and testing rock samples. These landmarks were explored along a winding path through a realistic geological terrain, modeled based on Lidar and photogrammetry data. As the pathway through the Grand Canyon is distinctly sloping and varied, the task of pointing to the perceived location of landmarks in this environment provided rich insights into participants' survey knowledge and content learning, and how such knowledge differed between the display conditions. Some insights and recommendations are derived for developers and consumers of the benefits of popular commercial VR head-mounted display systems.

11:30-12:10 Session 17C: Special Track: Computer Vision Advances in Geo-Spatial Applications and Remote Sensing
Chair:
Location: C
11:30
Semi-Supervised Fine-Tuning for Deep Learning Models in Remote Sensing Applications

ABSTRACT. A combinatory approach of two well-known fields: deep learning and semi supervised learning is presented, to tackle the land cover identification problem. The proposed methodology demonstrates the impact on the performance of deep learning models when SSL approaches are used as performance functions during training. The generated soft labels, by SSL approaches, over the encoded inputs, provided by stacked encoders, allow the utilization of the entire data set during the fine-tuning step, for any deep classifier. Obtained results, at pixel level segmentation tasks over orthoimages, suggest that SSL enhanced loss functions can be beneficial in models’ performance.

11:50
Scene Classification of Remote Sensing Images using convNet Features and Multi-grained Forest
PRESENTER: Ronald Tombe

ABSTRACT. Scene interpretation of remote sensing images entails effective spatial feature information extraction and application of an appropriate pattern recognition algorithm for feature learning. In literature, state-of-the-art results are attained in remote sensing using pre-trained convolutional neural networks (convNets or CNNs) for transfer learning in deep feature extraction and then applying classifiers to learn the features for scene classification. This work proposes a method that utilizes VGG-16 model for feature extraction and the multi-grain forest for feature learning and classification with ensemble classifiers majority voting. The Effectiveness of the proposed method is evaluated with UCMerced and WHU-Siri public datasets. Improved classification results are attained with the proposed method as compared to methods in the literature.

12:10-13:30Lunch Break
13:30-14:30 Session 18: Keynote - Kavita Bala
Chair:
Location: K
13:30
Understanding Visual Appearance from Micron to Global Scale

ABSTRACT. Augmented reality/mixed reality (AR/MR/XR) technologies are poised to create compelling and immersive user experiences by combining computer vision and computer graphics. Imagine users interacting with the world around them through their AR device. Visual search tells them what they are seeing, while computer graphics augments reality by overlaying real objects with virtual objects. AR/VR can have a far-ranging impact across many applications, such as retail, virtual prototyping, and entertainment.

In this talk, I will describe my group's research on these complementary areas: graphics models for realistic visual appearance, and visual search and fine-grained recognition for scene understanding. We will also see how these technologies can go beyond XR/AR/VR applications to enable visual discovery—using recognition as a core building block, we can mine social media images at a global scale to discover visual patterns and trends across geography and time.

14:40-15:40 Session 19A: Medical Image Analysis II
Chair:
Location: A
14:40
DeepTKAClassifier: Brand Classification of Total Knee Arthroplasty Implants using Explainable Deep Convolutional Neural Networks

ABSTRACT. Total knee arthroplasty (TKA) is one of the most successful surgical procedures worldwide. It improves quality of life, mobility, and functionality for the vast majority of cases with knee pain. However, a TKA surgery may fail over time for several reasons, thus it may need a revision arthroplasty procedure. Identifying TKA implants is a critical consideration in preoperative planning of revision surgery. This paper aims to develop, train, and validate deep convolutional neural network models to precisely classify four widely-used TKA implants based on only plain knee radiographs. Using 9,052 computationally annotated knee ra- diographs, we achieved weighted average precision, recall, and F1-score of 0.97, 0.97, and 0.97, respectively, with Cohen Kappa of 0.96.

15:00
Multi-modal Image Fusion based on Weight Local Features and Novel Sum-Modified-Laplacian in Non-Subsampled Shearlet Transform Domain
PRESENTER: Hajer Ouerghi

ABSTRACT. Multi-modal medical image fusion plays a significant role in clinical applications like noninvasive diagnosis and image-guided surgery. However, designing an efficient image fusion technique is still a challenging task. In this paper, we propose an improved multi-modal medical image fusion method to enhance the visual quality and contrast of the fused image. To achieve this work, the registered source images are firstly decomposed into low-frequency (LF) and several high-frequency (HF) sub-images via non-subsampled shearlet transform (NSST). Afterward, LF sub-images are combined using the proposed weight local features fusion rule based on local energy and standard deviation, while HF sub-images are fused based on the novel sum-modified-laplacien (NSML) technique. Finally, inversed NSST is applied to reconstruct the fused image. Furthermore, the proposed method is extended to color multi-modal image fusion that effectively restrains color distortion and enhances spatial and spectral resolutions. To evaluate the performance, various experiments conducted on different datasets of gray-scale and color images. Experimental results show that the proposed scheme achieves better performance than other state-of-art proposed algorithms in both visual effects and objective criteria.

15:20
Robust Prostate Cancer Classification with Siamese Neural Networks
PRESENTER: Alberto Rossi

ABSTRACT. Nuclear magnetic resonance (NMR) is a powerful and non–invasive diagnostic tool. However, NMR scanned images are often noisy due to patient motions or breathing. Although modern Computer Aided Diagnosis (CAD) systems, mainly based on Deep Learning (DL), together with expert radiologists, can obtain very accurate predictions, working with noisy data can induce a wrong diagnose or require a new acquisition, spending time and exposing the patient to an extra dose of radiation. In this paper, we propose a new DL model, based on a Siamese neural network, able to withstand random noise perturbations. We use data coming from the ProstateX challenge and demonstrate the superior robustness of our model to random noise compared to a similar architecture, albeit deprived of the Siamese branch. In addition, our approach is also resistant to adversarial attacks and shows overall better AUC performance.

14:40-15:40 Session 19B: Vision for Robotics
Location: B
14:40
Simple Camera-to-2D-LiDAR Calibration Method for General Use
PRESENTER: Andrew Palmer

ABSTRACT. As systems that utilize computer vision move into the public domain, methods of calibration need to become easier to use. Though multi-plane LiDAR systems have proven to be useful for vehicles and large robotic platforms, many smaller platforms and low cost solutions still require 2D LiDAR combined with RGB cameras. Current methods of calibrating these sensors make assumptions about camera and laser placement and/or require complex calibration routines. In this paper we propose a new method of feature correspondence in the two sensors and an optimization method capable of calibration target with unknown lengths in its geometry. Our system is designed with an inexperienced layperson as the intended user, which has lead us to remove as many assumptions about both the target and laser as possible. We show that our system is capable of calibrating the 2-sensor system from a single sample in configurations other methods are unable to handle.

15:00
SalsaNext: Fast, Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds
PRESENTER: Tiago Cortinhal

ABSTRACT. In this paper, we introduce SalsaNext for the uncertainty-aware semantic segmentation of a full 3D LiDAR point cloud in real-time. SalsaNext is the next version of SalsaNet which has an encoder-decoder architecture where the encoder unit has a set of ResNet blocks and the decoder part combines upsampled features from the residual blocks. In contrast to SalsaNet, we introduce a new context module, replace the ResNet encoder blocks with a new residual dilated convolution stack with gradually increasing receptive fields and add the pixel-shuffle layer in the decoder. Additionally, we switch from stride convolution to average pooling and also apply central dropout treatment. To directly optimize the Jaccard index, we further combine the weighted cross entropy loss with Lovasz-Softmax loss. We finally inject a Bayesian treatment to compute the epistemic and aleatoric uncertainties for each point in the cloud. We provide a thorough quantitative evaluation on the Semantic-KITTI dataset, which demonstrates that the proposed SalsaNext outperforms other state-of-the-art semantic segmentation networks and achieves 3.6% more accuracy over the previous state-of-the-art method. We also release our source code at http://github.com/tiagoCortinhal/SalsaNext.

15:20
Mobile Manipulator Robot Visual Servoing and Guidance for Dynamic Target Grasping
PRESENTER: Prateek Arora

ABSTRACT. This paper deals with the problem of real-time closed-loop tracking and grasping of a dynamic target by a mobile manipulation robot. Dynamic object tracking and manipulation is crucial for a robotic system that intends to physically interact with the real world. The robot considered corresponds to an eye-in-hand gripper-and-arm combination mounted on a four-wheel base. The proposed policy is inspired by the principles of visual servoing, and leverages a computationally simple paradigm of virtual force-based formulation, due to the intended deployment for real-time closed-loop control. The main objective of our strategy is to align a dynamic target frame to the onboard gripper while respecting the constraints of the mobile manipulator system. The algorithm was implemented on a real robot and evaluated across multiple diverse real-time experimental studies, detailed within this paper.

14:40-15:40 Session 19C: Statistical Pattern Recognition
Location: C
14:40
Interpreting Galaxy Deblender GAN from the Discriminator's Perspective
PRESENTER: Heyi Li

ABSTRACT. In large galaxy surveys, it can be difficult to separate overlapping galaxies, a process called deblending. Generative adversarial networks(GANs) have shown great potential in addressing this fundamental problem. However, it remains a significant challenge to comprehend how the network works, which is particularly difficult for non-expert users. This research focuses on understanding the behaviors of one of the network's major components, the Discriminator, which plays a vital role but is often overlooked. Specifically, we propose an enhanced Layer-wise Relevance Propagation(LRP) algorithm called Polarized-LRP. It generates a heatmap-based visualization highlighting the area in the input image that contributes to the network decision. It consists of two parts i.e. a positive contribution heatmap for the images classified as ground truth and a negative contribution heatmap for the ones classified as generated. As a use case, we have chosen the deblending of two overlapping galaxy images via a branched GAN model. Using the Galaxy Zoo dataset we demonstrate that our method clearly reveals the attention areas of the Discriminator to differentiate generated galaxy images from ground truth images, and outperforms the original LRP method. To connect the Discriminator's impact on the Generator, we also visualize the attention shift of the Generator across the training process. An interesting result we have achieved there is the detection of a problematic data augmentation procedure that would else have remained hidden. We find that our proposed method serves as a useful visual analytical tool for more effective training and a deeper understanding of GAN models.

15:00
Variational Bayesian Sequence to Sequence Networks for Memory-Efficient Sign Language Translation

ABSTRACT. Memory-efficient continuous Sign Language Translation is a significant challenge for the development of assisted technologies with real-time applicability for the deaf.In this work, we introduce a paradigm of designing recurrent deep networks whereby the output of the recurrent layer is derived from appropriate arguments from nonparametric statistics. A novel variational Bayesian sequence-to-sequence network architecture is proposed that consists of a) a full Gaussian posterior distribution for data-driven memory compression and b) a nonparametric Indian Buffet Process prior for regularization applied on the Gated Recurrent Unit non-gate weights. We dub our approach Stick-Breaking Recurrent network and show that it can achieve a substantial weight compression without diminishing modeling performance.

15:20
A Gaussian Process Upsampling Model for Improvements in Optical Character Recognition
PRESENTER: Steven Reeves

ABSTRACT. The automatic evaluation and extraction of financial docu- ments is a key process in business efficiency. Most of the extraction relies on the Optical Character Recognition (OCR), whose outcome is depen- dent on the quality of the document image. The image data fed to the automated systems can be of unreliable quality, inherently low-resolution or downsampled and compressed by a transmitting program. In this pa- per, we illustrate a novel Gaussian Process (GP) upsampling model for the purposes of improving OCR process and extraction through upsam- pling low resolution documents.

15:40-16:00Coffee Break
16:00-18:00 Session 20: Poster Session II
Location: PL
16:00
Systematic Optimization of Image Processing Pipelines Using GPUs
PRESENTER: Peter Roch

ABSTRACT. Real-time computer vision systems require fast and efficient image processing pipelines. Experiments have shown that GPUs are highly suited for image processing operations, since many tasks can be processed in parallel. However, calling GPU-accelerated functions requires uploading the input parameters to the GPU's memory, calling the function itself, and downloading the result afterwards. In addition, since not all functions benefit from an increase in parallelism, many pipelines cannot be implemented exclusively using GPU functions. As a result, the optimization of pipelines requires a careful analysis of the achievable function speedup and the cost of copying data. In this paper, we first define a mathematical model to estimate the performance of an image processing pipeline. Thereafter, we present a number of micro-benchmarks gathered using OpenCV which we use to validate the model and which quantify the cost and benefits for different classes of functions. Our experiments show that comparing the function speedup without considering the time for copying can overestimate the achievable performance gain of GPU acceleration by a factor of two. Finally, we present a tool that analyzes the possible combinations of CPU and GPU function implementations for a given pipeline and computes the most efficient composition. By using the tool on their target hardware, developers can easily apply our model to optimize their application performance systematically.

16:00
A Hybrid Approach for Improved Image Similarity Using Semantic Segmentation
PRESENTER: Achref Ouni

ABSTRACT. Content Based Image Retrieval(CBIR) is the task of finding the images from the datasets that consider similar to the input query based on its visual characteristics. Several methods from the state of the art based on visual methods (Bag of visual words, VLAD, ...) or recent deep leaning methods try to solve the CBIR problem. In particular, Deep learning is a new field and used for several vision applications including CBIR. But, even with the increase of the performance of deep learn- ing algorithms, this problem is still a challenge in computer vision. To tackle this problem, we present in this paper an efficient CBIR framework based on incorporation between deep learning based semantic segmen- tation and visual features. We show experimentally that the incorporate leads to the increase of accuracy of our CBIR framework. We study the performance of the proposed approach on four different datasets(Wang, MSRC V1,MSRC V2, Linnaeus)

16:00
Automated classification of Parkinson's Disease using Diffusion Tensor Imaging Data
PRESENTER: Harsh Sharma

ABSTRACT. Parkinson's Disease (PD) is one of the most common neurological disorders in the world, affecting over 6 million people globally. In recent years, Diffusion Tensor Imaging (DTI) biomarkers have been established as one of the leading techniques to help diagnose the disease. However, identifying patterns and deducing even preliminary results require a neurologist to automatically analyze the scan. In this paper, we propose a Machine Learning based algorithm that can analyze DTI data and predict if the person has PD. We were able to obtain a classification accuracy of 80% and an F1 score of 0.833 using our approach. The method proposed is expected to reduce the number of misdiagnosis by assisting the neurologists in making a decision.

16:00
Nonlocal Adaptive Biharmonic Regularizer for Image Restoration
PRESENTER: Ying Wen

ABSTRACT. In this paper, we propose a nonlocal adaptive biharmonic regularization term for image denoising and restoration, combining the advantages of fourth order models (without staircase effects and preserving slops) and nonlocal methods (preserving textures). The theoretical analysis of the proposed model is briefly discussed. For its numerical solution, we employ $L^2$ gradient descent and finite difference schemes and design explicit, semi-implicit, and implicit schemes. Numerical results for denoising and restoration are shown on synthetic images, real images, and texture images.

16:00
A Robust Approach to Plagiarism Detection in Handwritten Documents
PRESENTER: Om Pandey

ABSTRACT. Plagiarism detection is a widely used technique to uniquely identify quality of work. We address in this paper, the problem of predicting similarities amongst a collection of documents. This technique has widespread uses in academic institutions. In this paper, we propose a simple yet effective meth-od for detection of plagiarism by using a robust word detection and segmen-tation procedure followed by a convolution neural network (CNN) - Bi-directional Long Short Term Memory (biLSTM) pipeline to extract the text. Our approach also extract and encodes common patterns like scratches in handwriting for improving accuracy on real-world use cases. The extracted information from multiple documents using comparison metrics are used to find the documents which have been plagiarized from a source. Extensive experiments in our research show that this approach may help simplify the examining process and can act as a cheap viable alternative to many modern approaches used to detect plagiarism from handwritten documents.

16:00
Optical Coherence Tomography Latent Fingerprint Image Denoising

ABSTRACT. Latent fingerprints are fingerprint impressions left on the surfaces a finger comes into contact with. They are found in almost every crime scene. Conventionally, latent fingerprints have been obtained using chemicals or physical methods, thus destructive techniques. Forensic community is moving towards contact-less acquisition methods. The contact-less acquisition presents some advantages over destructive methods; such advantages include multiple acquisitions of the sample and a possibility of further analysis such as touch DNA. This work proposes a speckle-noise denoising method for optical coherence tomography latent fingerprint images. The proposed denoising technique was derived from the adaptive threshold and the normal shrinkage. Experimental results have shown that the proposed method suppressed speckle-noise better than the adaptive threshold, NormalShrink, VisuShrink, SUREShrink and BayesShrink.

16:00
CNN, Segmentation or Semantic Embeddings: Evaluating Scene Context for Trajectory Prediction
PRESENTER: Arsal Syed

ABSTRACT. For autonomous vehicles (AV) and social robot’s navigation, it is important for them to completely understand their surroundings for natural and safe interac-tions. While it is often recognized that scene context is important for understanding pedestrian behavior, it has received less attention than modeling social-context- influence from interactions between pedestrians. In this paper, we evaluate the effectiveness of various scene representations for trajectory prediction. Our work focuses on characterizing the impact of scene representations (sematic images vs. semantic embeddings) and scene quality (competing semantic segmentation net-works). We leverage a hierarchical RNN autoencoder to encode historical pedestrian motion, their social interaction and scene semantics into a low dimensional subspace and then decode to generate future motion prediction. Experimental evaluation on the ETH and UCY datasets show that using full scene semantics, specifically segmented images, can improve trajectory prediction over using just embeddings.

16:00
Automatic Extraction of Joint Orientations in Rock Mass using PointNet and DBSCAN

ABSTRACT. Measurement of joint orientation is an essential task for rock mass disconti-nuity characterization. This work presents a methodology for automatic ex-traction of joint orientations in a rock mass from 3D point cloud data gener-ated using Unmanned Aerial Vehicles and photogrammetry. Our algorithm first automatically classifies joints on 3D rock surface using state-of-the-art deep network architecture PointNet. It then identifies individual joints by the Density-Based Scan with Noise (DBSCAN) clustering and computes their orientations by fitting least-square planes using Random Sample Con-sensus. A major case study has been developed to evaluate the performance of the entire methodology. Our results showed the proposed approach out-performs similar approaches in the literature both in terms of accuracy and time complexity. Our experiments show the great potential in the applica-tion of 3D deep learning techniques for discontinuity characterization which might be used for the estimation of other parameters as well.

16:00
Feature Map Retargeting to Classify Biomedical Journal Figures

ABSTRACT. In this work, we introduce a layer to retarget feature maps of convolutional neural networks (CNNs). Our "Retarget" layer densely samples values from feature maps at the locations inferred through our proposed spatial attention regressor. Our layer increments an existing saliency-based distortion layer by replacing its convolutional components with depthwise convolutions along with a set of other additional learnable parameters. The aforementioned reformulations, and the tuning of a few hyper-parameters, make the Retarget layer applicable at any depth of a feed-forward CNNs. Keeping in spirit with retargeting methods used in Content-Aware Image Resizing, we introduce our layer at the bottlenecks of different pre-trained network architectures. We validate our layer on the ImageCLEF2013, ImageCLEF2015, and ImageCLEF2016 subfigure classification task. The redesigned DenseNet121 model with the Retarget layer achieves state-of-the-art results under the visual category when no data augmentations are performed. Performing spatial sampling at deeper layers increases computational cost and memory requirements exponentially. To address this, we experiment with an approximation of the nearest neighbor interpolation and show consistent improvement over the baseline models and other state-of-the-art attention models, demonstrating our layer's broad applicability. The code will be publicly available.

16:00
Automatic 3D Object Detection from RGB-D data using PU-GAN
PRESENTER: Xueqing Wang

ABSTRACT. 3D object detection from RGB-D data in outdoor scenes is crucial in various industrial applications such as autonomous driving, robotics, etc. However, the points obtained from range sensor scans are usually sparse and non-uniform, which seriously limit the detection performance. By learning a rich variety of point distributions from the latent space, we believe that 3D up-sampling techniques may fill up the missing knowledge due to the sparsity of the 3D points. Hence, a 3D object detection method using 3D upsampling techniques has been presented in this paper. The main contributions of the paper are two-fold. First, based on the Frustum PointNets pipeline, a 3D ob-ject detection method using PU-GAN has been implemented. A state-of-the-art 3D upsampling method, PU-GAN, is used to complement the sparsity of point cloud. Second, some effective strategies have been proposed to im-prove the detection performance using upsampled dense points. Extensive experimental results on KITTI benchmark show that the impacts of PU-GAN upsampled points on object detection are closely related to the object dis-tances from the camera. They show their superiority when they are applied on objects located at around 30 meters away. By carefully designing the cri-teria to employ the upsampled points, the developed method outperforms the baseline Frustum PointNets by a large margin for pedestrian and cyclist ob-jects.

16:00
Nodule Generation of Lung CT Images using a 3D Convolutional LSTM Network
PRESENTER: Kolawole Olulana

ABSTRACT. In the US, the American Cancer Society report for 2020 estimatesabout 228,820 new cases which could result in 135,720 deaths which translatesto 371 deaths per day compared to the overall daily cancer death of 1660. TheCancer Society of South Africa, (CANSA) reports that lung cancer and otherchronic lung diseases are leading causes of death nationally. Research in this areais necessary in order to reduce the number of reported deaths through early detection and diagnosis. A number of studies have been done using datasets forComputed Tomography (CT) images in the diagnosis and prognosis by oncologists, radiologists and medical professionals in the healthcare sector and a number of machine learning methods are being developed using conventional neuralnetworks (CNN) for feature extraction and binary classification with just a fewresearches making use of combined(hybrid) methods that have shown the capability to increase performance and accuracy in prediction and detection of earlystage onset of lung cancer. In this paper, a combined model is proposed using3D images as input to a combination of a CNN and long short-term memory(LSTM) network which is a type of recurrent neural network (RNN). The hybridization which often lead to increase need for computational resources will beadjusted by improving the nodule generation to focus only on the search spacearound the lung nodules, this proposed model requires less computation resources, avoiding the need to adding the whole 3D CT image into the network,therefore only the region of interest near candidate regions with nodules will bepre-processed. The results of previous traditional CNN architecture is comparedto this combined 3D Convolutional LSTM for nodule generation. In the experiments, the proposed hybrid model overperforms the traditional CNN architecturewhich shows how much improvement a hybridization of suitable models can contribute to lung cancer resear

16:00
Deep Learning Prediction of Glaucoma Progression with Macular Optical Coherence Tomography
PRESENTER: Serhat Sahin

ABSTRACT. Glaucoma is an eye disease that results in irreversible vision loss and is the second leading cause of blindness worldwide. Monitoring glaucoma patients for signs of progression and slowing the decay rate is the ultimate goal of glaucoma treatment. Clinicians depend on retinal structural information obtained with optical coherence tomography for tracking disease progression. In this work, we built two learning-based generative model using a conditional GAN architecture to predict glaucoma progression over time by reconstructing macular cross-sectional images from three or two prior measurements separated by six-month intervals with no constraints on the stage of the disease at baseline. A total of 2,379 predictions were made for eight patients based on the previous three visits and the predicted images demonstrated a high similarity compared with the ground truth images with an SSIM of 0.8325. A total of 3,111 predictions were made based on two prior visits resulting in an SSIM of 0.8336. This indicates that only two visits may actually be sufficient to use to make accurate prediction of Glaucoma progression.

16:00
SEMANTIC SEGMENTATION WITH PERIPHERAL VISION

ABSTRACT. Deep convolutional neural networks exhibit exceptional performance on many computer vision tasks, including semantic segmentation. Pre-trained networks trained on a relevant and large benchmark have a notable impact on these successful achievements. However, confronting a domain shift, usage of pre-trained encoders can not boost the performance of a model. In general, transfer learning is not a universal solution for different fields of science with small accessible datasets. An alternative approach is to develop stronger network models applicable to any problem rather than forcing scientists to explore available encoders in other literature for their particular problems. To deviate the direction of the research trend in semantic segmentation toward more effective models, we proposed an innovative convolutional module simulating the peripheral ability of the human eye. By utilizing our module in an encoder-decoder configuration, after extensive experiments, we could achieve better outcomes on several challenging benchmarks, including PASCAL VOC2012 and CamVid.

16:00
Deep Facial Expression Recognition with Occlusion Regularization
PRESENTER: Nikul Pandya

ABSTRACT. In computer vision, occlusions are mainly known as a challenge to cope with. For instance, partial occlusions of the face may lower the performance of facial expression recognition systems. However, when incorporated into the training, occlusions can be also helpful in improving the overall performance. In this paper, we propose and evaluate occlusion augmentation as a simple but effective regularizing tool for improving the general performance of deep learning based facial expression and action unit recognition systems, even if no occlusion is present in the test data. In our experiments we consistently found significant performance improvements on three databases (Bosphorus, RAF-DB, and AffectNet) and three CNN architectures (Xception, MobileNet, and a custom model), suggesting that occlusion regularization works independently of the dataset and architecture. Based on our clear results, we strongly recommend to integrate occlusion regularization into the training of all CNN-based facial expression recognition systems, because it promises performance gains at very low cost.

16:00
Generator From Edges: Reconstruction of Facial Images
PRESENTER: Nao Takano

ABSTRACT. Applications that involve supervised training require paired images. Researchers of single image super-resolution (SISR) create such images by artificially generating blurry input images from the corresponding ground truth. Similarly we can create paired images with the canny edge. We propose Generator From Edges (GFE) [Figure 1]. Our aim is to determine the best architecture for GFE, along with reviews of perceptual loss [1, 2]. To this end, we conducted three experiments. First, we explored the effects of the adversarial loss often used in SISR. In particular, we uncovered that it is not an essential component to form a perceptual loss. Eliminating adversarial loss will lead to a more effective architecture from the perspective of hardware resource. It also means that considerations for the problems pertaining to generative adversarial network (GAN) [3], such as mode collapse, are not necessary. Second, we reexamined VGG loss and found that the mid-layers yield the best results. By extracting the full potential of VGG loss, the overall performance of perceptual loss improves significantly. Third, based on the findings of the first two experiments, we reevaluated the dense network to construct GFE. Using GFE as an intermediate process, reconstructing a facial image from a pencil sketch can become an easy task.

16:00
CD2 : Combined Distances of Contrast Distributions for Image Quality Analysis
PRESENTER: Sascha Xu

ABSTRACT. The quality of visual input impacts both human and machine perception. Consequently many processing techniques exist that deal with different distortions. Usually they are applied freely and unsupervised. We propose a novel method called CD2 to protect against errors that arise during image processing. It is based on distributions of image contrast and custom distance functions which capture the effect of noise, compression, etc. CD2 achieves excellent performance on image quality analysis benchmarks and in a separate user test with only a small data and computation overhead.

16:00
Real-Time Person Tracking and Association on Doorbell Cameras
PRESENTER: Sung Chun Lee

ABSTRACT. This paper presents key techniques for real-time, multi-person tracking and association on doorbell surveillance cameras at the edge. The challenges for this task are: significant person size changes during tracking caused by person approaching or departing from the doorbell camera, person occlusions due to limited camera field and occluding objects in the camera view, and the requirement for a lightweight algorithm that can run in real time on the doorbell camera at the edge. To address these challenges, we propose a multi-person tracker that uses a detect-track-associate strategy to achieve good performance in speed and accuracy. The person detector only runs at every n-th frame, and between person detection frames a low-cost point-based tracker is used to track the subjects. To maintain subject tracking accuracy, at each person detection frame, a person association algorithm is used to associate persons detected in the current frame to the current and recently tracked subjects and identify any new subjects. To improve the performance of the point-based tracker, human-shaped masks are used to filter out background points. Further, to address the challenge of drastic target scale change during the tracking we introduced an adaptive image resizing strategy to dynamically adjust the tracker input image size to allow the point-based tracker to operate at the optimal image resolution given a fixed number of feature points. For fast and accurate person association, we introduced the Sped-Up LOMO, a fast version of the popular local maximal occurrence (LOMO) person descriptor. The experimental results on doorbell surveillance videos illustrate the efficacy of the proposed person tracking and association framework.

16:00
MySnapFoodLog: Culturally Sensitive FoodPhoto-Logging App for Dietary BiculturalismStudies
PRESENTER: Paul Stanik III

ABSTRACT. It is believed that immigrants to the U.S. have increasedrates of chronic diseases due to their adoption of the Western diet. Thereis a need to better understand the dietary intake of these immigrants.Tracking food consumption can be easily done by using a food app, butthere is currently no culturally-appropriate food tracking app that isrelatively easy for participants and the research community to use. TheMySnapFoodLog app was developed using the cross-platform Flutterframework to track users’ food consumption with the goal of using AI torecognize Filipino foods and determine if a meal is healthy or unhealthy.A pilot study demonstrates the feasibility of the app alpha release andthe need for further data collection and training to improve the Filipinofood recognition system.

16:00
Hand Gesture Recognition Based on the Fusion of Visual and Touch Sensing Data
PRESENTER: Frans Timbane

ABSTRACT. The use of computers has evolved so rapidly that our daily lives revolve around it. With the advancement of computer science and technology, the interaction be-tween humans and computers is not limited to mice and keyboards. The whole body interaction is the trend supported by the newest techniques. Hand gesture becomes more and more common, however, is challenged by lighting conditions, limited hand movements, and the occlusion of the hand images. The objective of this paper is to reduce those challenges by fusing vision and touch sensing data to accommodate the requirements of advanced human-computer interaction. In the development of this system, vision and touchpad sensing data were used to detect the fingertips using machine learning. The fingertips detection results were fused by a K-nearest neighbor classifier to form the proposed hybrid hand gesture recognition system. The classifier is then trained to classify four hand gestures. The classifier was tested in three different scenarios with static, slow motion, and fast movement of the hand. The overall performance of the system on both static and slow-moving hand are 100% precision for both training and testing sets, and 0% false-positive rate. In the fast-moving hand scenario, the system got a 95.25% accuracy, 94.59% precision, 96% recall, and 5.41% false-positive rate. Finally, using the proposed classifier, a real-time, simple, accurate, reliable, and cost-effective system was realized to control the Windows media player. The outcome of fusing the two input sensors offered better precision and recall performance of the system.

16:00
Gastrointestinal Tract Anomaly Detection from Endoscopic Videos using Object Detection Approach
PRESENTER: Tejas Chheda

ABSTRACT. Endoscopy is a medical procedure used for the imaging and examination of our internal body organs for detecting, visualizing and localising anomalies to facilitate their further treatment. Currently, the medical practitioners expertise is vastly relied upon to analyse these endoscopic videos. This can be a bottleneck in rural areas where specialized medical practitioners are scarce. By learning from and improving upon existing research, the proposed system leverages object detection methods to achieve an automated detection mechanism to provide real-time annotations to assist medical professionals performing endoscopy and provide insights for educational purposes. It works by extracting video frames and processing it using a real-time object detection deep learning model trained on a standard dataset to detect two anomalies namely: Esophagitis and Polyps. The output is in the form of an annotated video. Using Intersection over Union metric (IOU), the model is observed to be performing accurately on the training set but shows a lesser accuracy on the test set of images. This however can be improved using alternate metrics which are more suited to irregular shaped multi-class, multiple object detection and can better explain the observed results.

16:00
A multimodal high level video segmentation for content targeted online advertising
PRESENTER: Ruxandra Tapu

ABSTRACT. In this paper, we introduce a novel, content targeted, advertisements system dedicated to online video platforms. The proposed framework is designed from the viewer’s perspective, in terms of commercial contextual relevance and degree of intrusiveness. The major contribution of the paper concern a multimodal video temporal segmentation algorithm based on visual and audio cues, extracted using deep neural networks architectures that optimally generates semantically connected clusters of shots. The experimental evaluation performed on a dataset with 30 video documents validates the proposed methodology with precision and recall scores superior to 85%.

16:00
AI Playground: Unreal Engine-based Data Ablation Tool for Deep Learning
PRESENTER: Mehdi Mousavi

ABSTRACT. Machine learning requires data, but acquiring and labeling real-world data is challenging, expensive, and time-consuming. More importantly, it is nearly impossible to alter real data post-acquisition (e.g., change the illumination of a room), making it very difficult to measure how specific properties of the data affect performance. In this paper, we present AI Playground (AIP), an open-source, Unreal Engine-based tool for generating and labeling virtual image data. With AIP, it is trivial to capture the same image under different conditions (e.g., fidelity, lighting, etc.) and with different ground truths (e.g., depth or surface normal values). AIP is easily extendable and can be used with or without code. To validate our proposed tool, we generated eight datasets of otherwise identical but varying lighting and fidelity conditions. We then trained deep neural networks to predict (1) depth values, (2) surface normals, or (3) object labels and assessed each network's intra- and cross-dataset performance. Among other insights, we verified that sensitivity to different settings is problem-dependent. We confirmed the findings of other studies that segmentation models are very sensitive to fidelity, but we also found that they are just as sensitive to lighting. In contrast, depth and normal estimation models seem to be less sensitive to fidelity or lighting and more sensitive to the structure of the image. Finally, we tested our trained depth-estimation networks on two real-world datasets and obtained results comparable to training on real data alone, confirming that our virtual environments are realistic enough for real-world tasks.

16:00
Homework Helper: Providing Valuable Feedback on Math Mistakes
PRESENTER: Sara Davis

ABSTRACT. Many parents feel uncomfortable helping their children with homework, with only 66\% of parents consistently checking their child's homework. Because of this, many turn to math games and problem solvers as they have become widely available in recent years. Many of these applications rely on multiple choice or keyboard entry submission of answers, limiting their adoption. Auto graders and applications, such as PhotoMath, deprive students of the opportunity to correct their own mistakes, automatically generating a solution with no explanation. This work introduces a novel homework assistant -- Homework Helper (HWHelper) -- that is capable of determining mathematical errors in order to provide meaningful feedback to students without solutions. In this paper, we focus on simple arithmetic calculations, specifically multi-digit addition, introducing 2D-Add, a new dataset of worked addition problems. We design a system that acts as a guided learning tool for students allowing them to learn from and correct their mistakes. HWHelper segments a sheet of math problems, identifies the student's answer, performs arithmetic and pinpoints mistakes made, providing feedback to the student. HWHelper fills a significant gap in the current state-of-the-art for student math homework feedback.

16:00
Interface Design for HCI Classroom: From Learners' Perspective
PRESENTER: Huyen N. Nguyen

ABSTRACT. Having a good Human-Computer Interaction (HCI) design is challenging. Previous works have contributed significantly to fostering HCI, including design principle with report study from the instructor view. The questions of how and to what extent students perceive the design principles are still left open. To answer this question, this paper conducts a study of HCI adoption in the classroom. The studio-based learning method was adapted to teach 83 graduate and undergraduate students in 16 weeks long with four activities. A standalone presentation tool for instant online peer feedback during the presentation session was developed to help students justify and critique other's work. Our tool provides a sandbox, which supports multiple application types, including Web-applications, Object Detection, Web-based Virtual Reality (VR), and Augmented Reality (AR). After presenting one assignment and two projects, our results showed that students acquired a better understanding of the Golden Rules principle over time, which was demonstrated by the development of visual interface design. The Wordcloud reveals the primary focus was on the user interface and shed some light on students' interest in user experience. The inter-rater score indicates the agreement among students that they have the same level of understanding of the principles. The results show a high level of guideline compliance with HCI principles, in which we witnessed variations in visual cognitive styles. Regardless of diversity in visual preference, the students presented high consistency and a similar perspective on adopting HCI design principles. The results also elicited suggestions into the development of the HCI curriculum in the future.

16:00
Pre-trained Convolutional Neural Network for the Diagnosis of Tuberculosis

ABSTRACT. Tuberculosis (TB) is an infectious disease that claimed about 1.5 million lives in 2018. TB is most prevalent in developing regions. Even though TB disease is curable, it necessitates early detection to prevent its spread and casualties. Chest radiographs are one of the most reliable screening techniques; although, its accuracy is dependent on professional radiologists interpretation of the individual images. Consequently, we present a computer-aided detection system using a pre-trained convolutional neural network as features extractor and logistic regression classifier to automatically analyze the chest radiographs to provide a timely and accurate interpretation of multiple images. The chest radiographs were pre-processed before extracting distinctive features and then fed to the classifier to detect which image is infected. This work established the potential of implementing pre-trained convolutional neural network models in the medical domain to obtained good results despite limited datasets.

16:00
Near-Optimal Concentric Circles Layout: Visualizing Extensive Connectivity Structures of Scale-Free Graphs

ABSTRACT. The majority of graph visualization algorithms emphasize improving the readability of graphs by focusing on various vertex and edge rendering techniques. However, revealing the global connectivity structure of a graph by identifying significant vertices is an important and useful part of any graph analytics system. Centrality measures reveal the \most important" vertices of a graph, commonly referred to as central or influential vertices. Hence, a centrality-oriented visualization may highlight these important vertices and give deep insights into graph data. This paper proposes a mathematical optimization-based clustered graph layout called Near-Optimal Concentric Circles (NOCC) layout to visualize medium to large scale-free graphs. We cluster the vertices by their betweenness values and optimally place them on concentric circles to reveal the extensive connectivity structure of the graph while achieving aesthetically pleasing layouts. Besides, we incorporate different edge rendering techniques to improve graph readability and interaction.

16:00
Facial Expression Recognition and Ordinal Intensity Estimation: A Multilabel Learning Approach

ABSTRACT. Facial Expression Recognition has gained considerable attention in the field of affective computing, but only a few works considered the intensity of emotion embedded in the expression. Even the available studies on expression intensity estimation successfully assigned a nominal/regression value or classified emotion in a range of intervals. The approaches from multiclass and its extensions do not conform to man heuristic manner of recognising emotion with the respective intensity. This work is presenting a Multi-label CNN-based model that could simultaneously recognise emotion and also provide ordinal metrics as the intensity of the emotion. In the experiments conducted on BU-3DFE and Cohn Kanade (CK+) datasets, we check how well our model could adapt and generalise. Our model gives promising results with multilabel evaluation metrics and generalises well when trained on BU-3DFE and evaluated on CK+.

16:00
Unsupervised Anomaly Detection of the First Person in Gait from an Egocentric Camera
PRESENTER: Mana Masuda

ABSTRACT. Assistive technology is increasingly important as the senior population grows. The purpose of this study is to develop a means of preventing fatal injury by monitoring the movements of the elderly and sounding an alarm if an accident occurs. We present a method of detecting an anomaly in a first-person’s gait from an egocentric video. Followed by the conventional anomaly detection methods, we train the model in an unsupervised manner. We use optical flow images to capture ego-motion information in the first person. To verify the effectiveness of our model, we introduced and conducted experiments with a novel first-person video anomaly detection dataset and showed that our model outperformed the baseline method.

16:00
Prostate MRI Registration Using Siamese Metric Learning
PRESENTER: Alexander Lyons

ABSTRACT. The process of registering intra-procedural prostate magnetic resonance images (MRI) with corresponding pre-procedural images improves the accuracy of certain surgeries, such as a prostate biopsy. Aligning the two images by means of rigid and elastic deformation may permit more precise use of the needle during the operation. However, gathering the necessary data and computing the ground truth is a problematic step. Currently, a single dataset is available and it is composed of only a few cases, making the training of standard deep convolutional neural networks difficult. To address this issue the moving image (intra-procedural) is randomly augmented producing different copies, and a convolutional siamese neural network tries to choose the best aligned copy with respect to the reference image (pre-procedural). The results of this research show that this method is superior to both a simple baseline obtained with standard image processing techniques and a deep CNN model. Furthermore, the best policy found for building the couple set for the siamese neural network reveals that a rule based on the mutual information that considers only the highest and the lowest value, representing similar and dissimilar cases, is the best option for training. The use of mutual information allows the model to be unsupervised, since the segmentation is no longer necessary. Finally, research on the size of the augmented set is conducted, showing that producing 18 different candidates is sufficient for a good performance.

16:00
Emotion Categorization from Video-frame Images using a Novel Sequential Voting Technique

ABSTRACT. Emotion recognition can be the process of identifying different emotions in humans based on their facial expressions. It requires time and sometimes it is hard for human classifiers to agree with each other about an emotion category of a facial expression. However, machine learning classifiers have done well in classifying different emotions and have widely been used in recent years to facilitate the task of emotion categorization. Much research on emotion video databases uses a few frames from when emotion is expressed at peak to classify emotion, which might not give a good recognition accuracy when predicting frames where the emotion is less intense. In this paper, using the CK+ emotion dataset as an example, we use more frames to analyze emotion from mid and peak frame images and compared our results to a method using fewer peak frames. Furthermore, we propose an approach based on sequential voting and apply it to more frames of the CK+ database. Our approach resulted in up to 85.9\% accuracy for the mid frames and overall accuracy of 96.5\% for the CK+ database compared with the accuracy of 73.4\% and 93.8\% from existing techniques.

17:00-18:00 Session 21: Poster Session II (pre-recorded)
Location: PR
17:00
Reducing Triangle Inequality Violations with Deep Learning and Its Application to Image Retrieval
PRESENTER: Izat Khamiyev

ABSTRACT. Given a distance matrix with triangular inequality violations, metric nearness problem requires to find a closest matrix that satisfies triangle inequality. It has been experimentally shown that deep neural networks can be used to efficiently produce close matrices with fewer number of triangular inequality violations. This paper further extends the deep learning approach to the metric nearness problem by applying it to the content-based image retrieval. Since vantage space representation of image database requires distances to satisfy triangle inequalities, applying deep learning to the matrices in the vantage space with triangular inequality violations produces distance matrices with fewer number of violations. Experiments performed on Corel-1k dataset demonstrate that fully convolutional autoencoders considerably reduce triangular inequality violations on distance matrices. Overall, the image retrieval accuracy based on the distance matrices generated by the deep learning model is better than that based on the original matrices in 91.16% of the time.

17:05
A Driver Guidance System to Support the Stationary Wireless Charging of Electric Vehicles

ABSTRACT. Air pollution is a problem in many cities. Although it is possible to mitigate this problem by replacing combustion with electric engines, at the time of writing, electric vehicles are still a rarity in European cities. Reasons for not buying an electric vehicle are not only the high purchase costs but also the uncomfortable initiation of the charging process. A more convenient alternative is wireless charging, which is enabled by integrating an induction plate into the floor and installing a charging interface at the vehicle. To maximize efficiency, the vehicle’s charging interface must be positioned accurately above the induction plate which is integrated into the floor. Since the driver cannot perceive the region below the vehicle, it is difficult to precisely align the position of the charging interface by maneuvering the vehicle. In this paper, we first discuss the requirements for driver guidance systems that help drivers to accurately position their vehicle and thus, enables them to maximize the charging efficiency. Thereafter, we present a prototypical implementation of such a system. To minimize the deployment cost for charging station operators, our prototype uses an inexpensive off-the-shelf camera system to localize the vehicles that are approaching the station. To simplify the retrofitting of existing vehicles, the prototype uses a smartphone app to generate navigation visualizations. To validate the approach, we present several experiments indicating that, despite its low cost, the prototype can technically achieve the necessary precision and usability.

17:10
An Efficient Tiny Feature Map Network For Real-Time Semantic Segmentation
PRESENTER: Hang Huang

ABSTRACT. In this paper, we propose an efficient semantic segmentation network named Tiny Feature Map Network (TFMNet). This network significantly improves the running speed while achieves good accuracy. Our scheme uses a lightweight backbone network to extract primary features from input images of particular sizes. The hybrid dilated convolution framework and the DenseASPP module are used to alleviate the gridding problem. We evaluate the proposed network on the Cityscapes and CamVid datasets, and obtain performance comparable with the existing state-of-the-art real-time semantic segmentation methods. Specifically, it achieves 72.9% mIoU on the Cityscapes test dataset with only 2.4M parameters and a speed of 113 FPS on NVIDIA GTX 1080 Ti without pre-training on the ImageNet dataset.

17:15
A Modified Syn2Real Network for Nighttime Rainy Image Restoration
PRESENTER: Qunfang Tang

ABSTRACT. The restoration or enhancement of rainy images at nighttime is of great significance to outdoor computer vision applications such as self-driving and traffic surveillance.While image deraining has drawn increasingly research attention currently and the majority of deraining methods are able to achieve satisfying performance for daytime image rain removal, there are few related studies for nighttime image deraining, as the conditions of nighttime rainy scenes are more complicated and challenging.To address the nighttime image deraining issues, we designed an improved model based on the Syn2Real network, called NIRR. In order to obtain good rain removal and visual effect under the nighttime rainy scene, we propose a new refined loss function for the supervised learning phase, which combines the perceptual loss and SSIM loss.The qualitative and quantitative experimental results show that our proposed method outperforms the state-of-the-arts whether it is on the synthetic nighttime rainy image or on the real-world nighttime rainy image.

17:20
Unsupervised domain adaptation for person re-identification with few and unlabeled target data
PRESENTER: George Galanakis

ABSTRACT. Existing, fully supervised methods for person re-identification (ReID) require annotated data acquired in the target domain in which the method is expected to operate. This includes the IDs as well as images of persons in that domain. This is an obstacle in the deployment of ReID methods in novel settings. For solving this problem, semi-supervised or even unsupervised ReID methods have been proposed. Still, due to their assumptions and operational requirements, such methods are not easily deployable and/or prove less performant to novel domains/settings, especially those related to small person galleries. In this paper, we propose a novel approach for person ReID that alleviates these problems. This is achieved by proposing a completely unsupervised method for fine tuning the ReID performance of models learned in prior, auxiliary domains, to new, completely different ones. The proposed model adaptation is achieved based on only few and unlabeled target persons' data. Extensive experiments investigate several aspects of the proposed method in an ablative study. Moreover, we show that the proposed method is able to improve considerably the performance of state-of-the-art ReID methods in state-of-the-art datasets.

17:25
How Does Computer Animation Affect Our Perception Of Emotions in Video Summarization?
PRESENTER: Camila Kolling

ABSTRACT. With the exponential growth of film productions and the popularization of the web, the summary of films has become a useful and important resource.Movies data specifically has become one of the most entertaining sources for viewers, especially during quarantine. However, browsing a movie in enormous collections and searching for a desired scene within a complete movie is a tedious and time-consuming task. As a result, automatic and personalized movie summarization has become a common research topic. In this paper, we focus on emotion summarization for videos with one shot and apply three independent methods for its summarization. We provide two different ways to visualize the main emotions of the generated summary and compare both approaches. The first one uses the original frames of the video and the other uses an open source facial animation tool to create a virtual assistant that provides the emotion summarization. For evaluation, we conducted an extrinsic evaluation using a questionnaire to measure the quality of each generated video summary. Experimental results show that even though both videos had similar answers, a different technique for each video had the most satisfying and informative summary

17:30
Where's Wally: A Gigapixel Image Study for Face Recognition in Crowds

ABSTRACT. Several devices can capture images with a large number of people, including those of high resolution known as gigapixel images. Those images can be helpful for studies and investigations, such as finding people in a crowd. They can provide more details but it can become a hard and challenging work problem to identify someone in the crowd. In this paper, we aim to help the work of a human observer with larger images with crowds by reducing the search space for several images to a ranking of ten images related to a specific person. Our model collect faces in a crowded Gigapixel image and then searching for people through the use of three different poses (front, right and left). To evaluate our method, we built a handcraft dataset with 42 people and we achieved a recognition rate of 69% in total dataset. We highlight that from the 31% ``not found'' between first ten in the ranking, many of them are very close to this boundary and, also 92% of non-matched are occluded by some accessory or by another face. Results shown great potential of our method to help a human observer find people in crowd, especially cluttered images, by providing him with a reduced search space.

17:35
Optical Flow Based Background Subtraction with a Moving Camera: Application to Autonomous Driving

ABSTRACT. In this research we present a novel algorithm for background subtraction using a moving camera. Our algorithm is based purely on visual information obtained from a camera mounted on an electric bus, operating in downtown Reno which automatically detects moving objects of interest with the view to provide information for collision avoidance and number of vehicles for an autonomous vehicle. In our approach we exploit the optical flow vectors generated by the motion of the camera on the bus while keeping parameter assumptions at a minimum. At first, we estimate the Focus of Expansion which is used to model and simulate 3D points given the intrinsic parameters of the camera and perform multiple linear regression to estimate the regression equation parameters and implement on the real data set of every frame to identify moving objects. We validated our algorithm using data taken from a common bus route in the city of Reno.