ISVC'20: 15TH INTERNATIONAL SYMPOSIUM ON VISUAL COMPUTING
PROGRAM FOR MONDAY, OCTOBER 5TH
Days:
next day
all days

View: session overviewtalk overview

09:00-10:00 Session 2: Keynote - Aaron Hertzmann
Location: K
09:00
Can Computers Create Art?

ABSTRACT. In this talk, I will discuss whether computers, using Artificial Intelligence (AI), could create art. I cover the history of automation in art, examining the hype and reality of AI tools for art together with predictions about how they will be used. I will also discuss different scenarios for how an algorithm could be considered the author of an artwork, which, I argue, comes down to questions of why we create and appreciate artwork.

10:10-11:10 Session 3A: Deep Learning I
Location: A
10:10
Regularization and Sparsity for Adversarial Robustness and Stable Attribution
PRESENTER: Daniel Schwartz

ABSTRACT. In recent years, deep neural networks (DNNs) have had great success in machine learning and pattern recognition. It has been shown that these networks can match or exceed human-level performance in difficult image recognition tasks. However, recent research has raised a number of critical questions about the robustness and stability of these deep learning architectures. Specifically, it has been shown that they are prone to adversarial attacks, i.e. perturbations added to input images to fool the classifier, and furthermore, trained models can be highly unstable to hyperparameter changes. In this work, we craft a series of experiments with multiple deep learning architectures, varying adversarial attacks, and different class attribution methods on the CIFAR-10 dataset in order to study the effect of sparse regularization to the robustness (accuracy and stability), in deep neural networks. Our results both qualitatively show and empirically quantify the amount of protection and stability sparse representations lend to machine learning robustness in the context of adversarial examples and class attribution.

10:30
Self-Competitive Neural Networks
PRESENTER: Iman Saberi

ABSTRACT. Deep Neural Networks (DNNs) have improved the accuracy of classification problems in lots of applications. One of the challenges in training a DNN is its need to be fed by an enriched dataset to increase its accuracy and avoid it suffering from overfitting. One way to improve the generalization of DNNs is to augment the training data with new synthesized adversarial samples. Recently, researchers have worked extensively to propose methods for data augmentation. In this paper, we generate adversarial samples to refine the Domains of Attraction (DoAs) of each class. In this approach, at each stage, we use the model learned by the primary and generated adversarial data (up to that stage) to manipulate the primary data in a way that look complicated to the DNN. The DNN is then retrained using the augmented data and then it again generates adversarial data that are hard to predict for itself. As the DNN tries to improve its accuracy by competing with itself (generating hard samples and then learning them), the technique is called Self-Competitive Neural Network (SCNN). To generate such samples, we pose the problem as an optimization task, where the network weights are fixed and use a gradient descent based method to synthesize adversarial samples that are on the boundary of their true labels and the nearest wrong labels. Our experimental results show that data augmentation using SCNNs can significantly increase the accuracy of the original network. As an example, we can mention improving the accuracy of a CNN trained with 1000 limited training data of MNIST dataset from 94.26% to 98.25%.

10:50
A Novel Contractive GAN Model for a Unified Approach Towards Blind Quality Assessment of Images from Heterogeneous Sources
PRESENTER: Tan Lu

ABSTRACT. The heterogeneous distributions of pixel intensities between natural scene and document images casts challenges for generalizing quality assessment models across these two types of images, where human perceptual scores and optical character recognition accuracy are the respective quality metrics. In this paper we propose a novel contractive generative adversarial model to learn a unified quality-aware representation of images from heterogeneous sources in a latent domain. We then build a unified image quality assessment framework by applying a regressor in the unveiled latent domain, where the regressor operates as if it is assessing the quality of a single type of images. Test results on blur distortion across three benchmarking datasets show that the proposed model achieves promising performance competitive to the state-of-the-art simultaneously for natural scene and document images.

10:10-11:10 Session 3B: Segmentation
Chair:
Location: B
10:10
Towards Optimal Ship Navigation Using Image Processing
PRESENTER: Bekir Sahin

ABSTRACT. Shipping transportation developed over years with the technological advancements. Modern ship navigation is conducted with the help of Automatic Radar Plotting Aid (ARPA) and Electronic Chart Display and Information System (ECDIS). Location map, marine traffic, geographical conditions, and obstacles in a region can be monitored by these technologies. The obstacles may vary from icebergs and ice blocks to islands, debris, rocks, or other vessels in a given vicinity. In this study, we propose an approach for route optimization using two-dimensional radar images and image segmentation in an environment with obstacles. The navigation algorithm takes image segmentation results as an input and finds the optimal route (i.e. safest and shortest). One of the advantages of this study is that the obstacles are not solely polygonal, but they may be in any shape, size, and color. The proposed approach has some practical and computational limitations; however, the future unmanned vessels could benefit from the improved applications of this route optimization approach in terms of energy consumption, time, and workforce.

10:30
Overscan Detection in Digitized Analog Films by Precise Sprocket Hole Segmentation
PRESENTER: Daniel Helm

ABSTRACT. Automatic video analysis is explored in order to understand and interpret real-world scenes automatically. For digitized historical analog films, this process is influenced by the video quality, video composition or scan artifacts called overscanning. The main aim of this paper is to find the Sprocket Holes (SH) in digitized analog film frames in order to drop unwanted overscan areas and extract the correct scaled final frame content which includes the most significant frame information. The outcome of this investigation proposes a precise overscan detection pipeline which combines the advantages of supervised segmentation networks such as DeepLabV3 with an unsupervised Gaussian Mixture Model for fine-grained segmentation based on histogram features. Furthermore, this exploration demonstrates the strength of using low-level backbone features in combination with low-cost CNN architectures like SqueezeNet in terms of inference runtime and segmentation performance. Moreover, a pipeline for creating photo-realistic frame samples to build a self-generated dataset is introduced and used in the training and validation phase. This dataset consists of 15000 image-mask pairs including synthetically created and deformed SHs with respect to the exact film reel layout geometry. Finally, the approach is evaluated by using real-world historical film frames including original SHs and deformations such as scratches, cracks or wet splices. The proposed approach reaches a Mean Intersection over Union (mIoU) score of 0.9509 (@threshold: 0.5) as well as a Dice Coefficient of 0.974 (@threshold: 0.5) and outperforms state-of-the-art solutions. Finally, we provide full access to our source code as well as the self-generated dataset in order to promote further research on digitized analog film analysis and fine-grained object segmentation.

10:50
Pixel-level Corrosion Detection on Metal Constructions by Fusion of Deep Learning Semantic and Contour Segmentation

ABSTRACT. Corrosion detection on metal constructions is a major challenge in civil engineering for quick, safe and effective inspection. Existing image analysis approaches tend to place bounding boxes around the defected region which is not adequate both for structural analysis and pre-fabrication, an innovative construction concept which reduces maintenance cost, time and improves safety. In this paper, we apply three semantic segmentation-oriented deep learning models (FCN, U-Net and Mask R-CNN) for corrosion detection, which perform better in terms of accuracy and time and require a smaller number of annotated samples compared to other deep models, e.g. CNN. However, the final images derived are still not sufficiently accurate for structural analysis and pre-fabrication. Thus, we adopt a novel data projection scheme that fuses the results of color segmentation, yielding accurate but over-segmented contours of a region, with a processed area of the deep masks, resulting in high-confidence corroded pixels.

10:10-11:10 Session 3C: Visualization
Location: C
10:10
Referenced Based Color Transfer for Medical Volume Rendering

ABSTRACT. The benefits of medical imaging are enormous. Medical images provide considerable amounts of anatomical information and this facilitates medical practitioners in performing effective disease diagnosis and deciding upon the best course of medical treatment. A transition from traditional monochromatic medical images like CT scans, X-Rays, or MRI images to a colored 3D representation of the anatomical structure further enhances the capabilities of medical professionals in extracting valuable medical information. The proposed framework in our research starts with performing color transfer by finding deep semantic correspondence between two medical images: a colored reference image, and a monochromatic CT scan or an MRI image. We extend this idea of reference based colorization technique to perform colored volume rendering from a stack of grayscale medical images. Furthermore, we also propose to use an effective reference image recommendation system to aid for the selection of good reference images. With our approach, we successfully perform colored medical volume visualization and essentially eliminate the painstaking process of user interaction with a transfer function to obtain color parameters for volume rendering.

10:30
An Empirical Methodological Study of Evaluation Methods Applied to Educational Timetabling Visualizations

ABSTRACT. The conception, and usage, of methods designed to evaluate information visualizations is a challenge that goes along with the development of these visualizations. In the scientific literature there is a myriad of proposals for such methods. However, none of them was able to pacify the field or establish itself as a "de facto" standard, due to difficulties like: (a) the complexity of its usage; (b) high financial and time costs; and (c) the need of a large number of raters to guarantee the reliability of the results. One way to circumvent such adversities is the usage of Heuristic-Based Evaluations thanks to its simplicity, low cost and speed of application and the quality of reached results. This article intends to conduct an empirical methodological study about the use of three heuristic-based methods (Zuk bet al., Forsell & Johansson and Wall et al.) for evaluation of visualizations in the context of Educational Timetabling Problems. Five different visualizations, extracted from literature, were evaluated using the original methods and versions modifiedby the authors (where an importance factor were assigned to each statement being evaluated, as well as the level of confidence of the rater in his/her appointment) in order to improve their efficiency when measuring the quality of visualizations. The experimental results demonstrated that for the two first heuristics, only the modification on the importance of the statements proved to be (statistically) relevant. For the third one, both factors did not induce different results.

10:50
Real-Time Contrast Enhancement for 3D Medical Image Stacks
PRESENTER: Jurgen Schulze

ABSTRACT. Medical professionals rely on medical imaging to help diagnose and treat patients. It is therefore important for them to be able to see all the details captured in the images. Often the use of contrast enhancement or noise reduction techniques are used to help improve the image quality. This paper introduces a real-time implementation of 3D Contrast Limited Adaptive Histogram Equalization (CLAHE) to enhance 3D medical image stacks, or volumes. This algorithm can be used interactively by medical doctors to help visualize the 3D medical volumes and prepare for surgery. It also introduces two novel extensions to the algorithm to allow a user to interactively decide on what region to focus the enhancement: Focused CLAHE and Masked CLAHE. Focused CLAHE applies the 3D CLAHE algorithm to a specified block of the entire medical volume and Masked CLAHE applies the algorithm to a selected organ or organs. These three contributions can be used, to not only help improve the visualization of 3D medical image stacks, but also to provide that contrast enhancement in real-time.

11:10-11:30Coffee Break
11:30-12:10 Session 4A: Deep Learning I
Location: A
11:30
Nonconvex Regularization for Network Slimming: Compressing CNNs Even More
PRESENTER: Kevin Bui

ABSTRACT. In the last decade, convolutional neural networks (CNNs) have evolved to become the dominant models for various computer vision tasks, but they cannot be deployed in low-memory devices due to its high memory requirement and computational cost. One popular, straightforward approach to compressing CNNs is network slimming, which imposes an $\ell_1$ penalty on the channel-associated scaling factors in the batch normalization layers during training. In this way, channels with low scaling factors are identified to be insignificant and are pruned in the models. In this paper, we propose replacing the $\ell_1$ penalty with the $\ell_p$ and transformed $\ell_1$ (T$\ell_1$) penalties since these nonconvex penalties outperformed $\ell_1$ in yielding sparser satisfactory solutions in various compressed sensing problems. In our numerical experiments, we demonstrate network slimming with $\ell_p$ and T$\ell_1$ penalties on VGGNet and Densenet trained on CIFAR 10/100. The results demonstrate that the nonconvex penalties compress CNNs better than $\ell_1$. In addition, T$\ell_1$ preserves the model accuracy after channel pruning, and $\ell_{1/2, 3/4}$ yield compressed models with similar accuracies as $\ell_1$ after retraining.

11:50
Biologically Inspired Sleep Algorithm for VariationalAuto-Encoders
PRESENTER: Sameerah Talafha

ABSTRACT. Variational auto-encoders (VAEs) are a class of likelihood-based generative models that operate by providing an approximation to the problem of inference by introducing a latent variable and encoder/decoder components. However, the latent codes usually have no structure, are not informative, and are not interpretable. This problem is amplified if these models need to be used for auxiliary tasks or when different aspects of the generated samples need to be controlled or interpreted by humans. We address these issues by proposing a biologically realistic sleep algorithm for VAEs (VAE-sleep). The algorithm augments the normal training phase of the VAE with an unsupervised learning phase in the equivalent spiking VAE modeled after how the human brain learns, using the Mirrored Spike Timing Dependent Plasticity learning rule. We hypothesize the proposed unsupervised VAE-sleep phase creates more realistic feature representations, which in turn lead to increase a VAE’s robustness to reconstruct the input. We conduct quantitative and qualitative experiments, including comparisons with the state-of-the-art on three datasets: CelebA, MNIST, and Fashion-MNIST. We show that our model performs better than the standard VAE and varitional sparse coding (VSC) on benchmark classification task by demonstrating improved classification accuracy and significantly increased robustness to the number of latent dimensions. As a result of experiments suggest, the proposed method shows improved performance in comparison with other widely used methods and performs favorably under the metrics PSNR, SSIM, LPIPS. The quantitative evaluations also suggest that our model can generate more realistic images compared to the state of arts when tested on disturbed or noisy inputs.

11:30-12:10 Session 4B: Segmentation
Chair:
Location: B
11:30
CSC-GAN: Cycle and semantic consistency for dataset augmentation

ABSTRACT. Image-to-image translation is a computer vision problem where a task learns a mapping from a source domain A to a target domain B using a training set. However, this translation is not always accurate, and during the translation process, relevant semantic information can deteriorate. To handle this problem, we propose a new cycle-consistent, adversarially trained image-to-image translation with a loss function that is constrained by semantic segmentation. This formulation encourages the model to preserve semantic information during the translation process. For this purpose, our loss function evaluates the accuracy of the synthetically generated image against a semantic segmentation model, previously trained. Reported results show that our proposed method can significantly increase the level of details in the synthetic images. We further demonstrate our method's effectiveness by applying it as a dataset augmentation technique, for a minimal dataset, showing that it can improve the semantic segmentation accuracy.

11:50
Improvements on the Superpixel Hierarchy Algorithm with Applications to Image Segmentation and Saliency Detection

ABSTRACT. Superpixel techniques aim to divide an image into a different predefined number of regions or groups of pixels, to facilitate operations such as segmentation. However, finding the optimal number of regions for each image becomes a difficult task due to the large difference of features observed in images. However, with the help of edge and color information, we can target an ideal number of regions for each image. This work presents two modifications to the known Superpixel hierarchy algorithm. These changes aim to define the number of superpixels automatically through edge information with different orientations and the Hue channel of the HSV color model. The results are presented quantitatively and qualitatively for edge detection and saliency estimation problems. The experiments were conducted on the BSDS500 and ECSSD datasets.

11:30-12:10 Session 4C: Visualization
Location: C
11:30
Flow Map Processing by Space-Time Deformation
PRESENTER: Thomas Wilde

ABSTRACT. In Flow Visualization, the consideration of flow maps instead of velocity fields has recently moved into the focus of research. We present an approach to the transformation of standard techniques in vector field processing -- like smoothing, modeling, deformation -- to flow maps. This requires a solution to the fundamental problem that -- contrary to vector fields -- a certain modification of the flow map is, in general, not a flow map anymore. We introduce a concept that enables the modification of discrete sampling of a flow map while enforcing the flow map properties. Based on this, we present approaches for flow map deformation, that are applied to a 2D time-dependent flow field.

11:50
GenExplorer: Visualizing and Comparing Gene Expression Levels via Differential Charts
PRESENTER: Chau Pham

ABSTRACT. This paper describes a visual interface for analyzing gene expression data generated from multiple biological samples under different controlled conditions. The tasks are to provide a comprehensive overview of thousands of genes under different states and to have an intuitive way to narrow down the set of genes with common behaviors. Our method involves using multidimensional projections and differential charts to help users analyze different sets of data via a web-based interface. Incorporating these charts and other visualization techniques into our final design makes the application accessible to genetics analysts, as demonstrated in the two use cases in plant and cancer researches. We further discuss the feedback from domain experts and the limitations of our approach to accommodate gene exploration tasks.

12:10-13:30Lunch Break
13:30-14:30 Session 5: Keynote - Victoria Interrante
Location: K
13:30
Spatial Perception and Presence in Virtual Architectural Environments

ABSTRACT. Immersive Virtual Reality (VR) technology has tremendous potential applications in architecture and design. In this talk I will review some of the work being done in my lab to enhance the utility of VR for architecture and design applications, focusing primarily on the investigation of factors influencing spatial perception accuracy in immersive architectural environments, but also including the use of VR technology to investigate questions of interest to architectural and interior designers such as how wallpaper patterns and window features affect people’s subjective experience in architectural interiors.

15:15-16:15 Session 6A: Video Analysis and Event Recognition
Location: A
15:15
An Event-Based Hierarchical Method for Customer Activity Recognition in Retail Stores
PRESENTER: Jiahao Wen

ABSTRACT. Customer Activity (CA) provides valuable information for marketing. CA is a collective name of customer information from on-the-spot observation in retail environments. Existing methods of Customer Activity Recognition (CAR) recognize CA by specialized end-to-end (e2e) models. Consequently, when marketing requires changing recognition targets, specialized e2e models are not reconfigurable to fit different marketing demands unless rebuilding the models entirely. Besides, redundant computation in the existing CAR system leads to low efficiency. Also, the low maintainability of the CAR system results in lots of modifications when updating methods in the system. In this research, we decompose behaviors into several primitive units called "event." We propose an event-based CAR method to achieve reconfigurability and design a hierarchy to solve issues about redundancy and maintainability. The evaluation results show that our proposed method can adapt and perform better than existing methods, which fits different marketing demands.

15:35
Fully Autonomous UAV-based Action Recognition System Using Aerial Imagery
PRESENTER: Han Peng

ABSTRACT. Human action recognition is an important topic in artificial intelligence with a wide range of applications including surveillance systems, search-and-rescue operations, human-computer interaction, etc. However, most of the current action recognition systems utilize videos captured by stationary cameras. Another emerging technology is the use of unmanned ground and aerial vehicles (UAV/UGV) for different tasks such as transportation, traffic control, border patrolling, wild-life monitoring, etc. This technology has become more popular in recent years due to its affordability, high maneuverability, and limited human interventions. However, there does not exist an efficient action recognition algorithm for UAV-based monitoring platforms. This paper considers UAV-based video action recognition by addressing the key issues of aerial imaging systems such as camera motion and vibration, low resolution, and tiny human size. In particular, we propose an automated deep learning-based action recognition system which includes the three stages of video stabilization using the SURF feature selection and Lucas-Kanade method, human action area detection using faster region-based convolutional neural networks (R-CNN), and action recognition. We propose a novel structure that extends and modifies the InceptionResNet-v2 architecture by combining a 3D CNN architecture and a residual network for action recognition. We achieve an average accuracy of 85.83% for the entire-video-level recognition when applying our algorithm to the popular UCF-ARG aerial imaging dataset. This accuracy significantly improves upon the state-of-the-art accuracy by a margin of 17%.

15:55
Hierarchical Action Classification with Network Pruning

ABSTRACT. Research on human action classification has made significant progresses in the past few years. Most deep learning methods focus on improving performance by adding more network components. We propose, however, to better utilize auxiliary mechanisms, including hierarchical classification, network pruning, and skeleton-based preprocessing, to boost the model robustness and performance. We test the effectiveness of our method on three commonly used testing datasets: NTU RGB+D 60, NTU RGB+D 120, and Northwestern-UCLA Multiview Action 3D. Our experiments show that our method can achieve either comparable or better performance than state-of-the-art methods on all three datasets. In particular, our method sets up a new baseline for NTU 120, the largest dataset among the three. We also analyze our method with extensive comparisons and ablation studies.

15:15-16:15 Session 6B: Special Track: Computational Bioimaging
Location: B
15:15
Ensemble Convolutional Neural Networks for the Detection of Microscopic Fusarium Oxysporum

ABSTRACT. The Panama disease has been reported to wipe out banana plantations due to the fungal pathogen known as Fusarium oxysporum f. sp. Cubense Tropical Race 4, or Foc TR4. Currently, there are no proven methods to control the spread of the disease. This study aims to develop an early detection model for Foc TR4 to minimize damages to infected plantations. In line with this, CNN models using the ResNet50 architecture were utilized towards the classification of the presence of Foc TR4 in a given microscopy image of a soil sample. Fungi samples were lab-cultivated, and images were taken using a lab microscope with three distinct microscopy configurations in LPO magnification. The initial results have shown that brightfield and darkfield images are generally more helpful in the automatic classification of fungi. Gradient-weighted Class Activation Mapping (Grad-CAM) was used to validate the decision processes of the individual CNN models. The proposed ensemble model shows promising results that achieved an accuracy of 91.46%. The model is beneficial as a low-cost preliminary test that could be performed on areas that are suspected to be infected with the pathogen given that the exported models can easily be implemented in a mobile system.

15:35
Offline versus Online Triplet Mining based on Extreme Distances of Histopathology Patches
PRESENTER: Milad Sikaroudi

ABSTRACT. We analyze the effect of offline and online triplet mining for colorectal cancer (CRC) histopathology dataset containing 100,000 patches. We consider the extreme, i.e., farthest and nearest patches with respect to a given anchor, both in online and offline mining. While many works focus solely on how to select the triplets online (batch-wise), we also study the effect of extreme distances and neighbor patches before training in an offline fashion. We analyze the impacts of extreme cases for offline versus online mining, including easy positive, batch semi-hard, and batch hard triplet mining as well as the neighborhood component analysis loss, its proxy version, and distance weighted sampling. We also investigate online approaches based on extreme distance and comprehensively compare the performance of offline and online mining based on the data patterns and explain offline mining as a tractable generalization of the online mining with large mini-batch size. As well, we discuss the relations of different colorectal tissue types in terms of extreme distances. We found that for a specific architecture, such as ResNet-18 in this study, offline and online mining approaches have comparable performances. Moreover, we found the assorted case, including different cases of extreme distances, is promising especially in the online approach.

15:55
Multi-Label Classification of Panoramic Radiographic Images using a Convolutional Neural Network
PRESENTER: Denis Salvadeo

ABSTRACT. Dentistry is one of the areas which mostly present potential for application of machine learning techniques, such as convolutional neural networks (CNNs). This potential derives from the fact that several of the typical diagnosis methods on dentistry are based on image analysis, such as diverse types of X-rays images. Typically, these analyses require an empiric and specialized assessment by the professional. In this sense, machine learning can contribute with tools to aid the professionals in dentistry, such as image classification, whose objective is to classify and identify patterns and classes on a set of images. The objective of this current study is to develop an algorithm based on a convolutional neural network with the skill to identify independently six specific classes on the images and classify them accordingly on panoramic X-ray images, also known as orthopantomography. The six independent classes are: Presence of all 28 teeth, restoration, dental appliances, dental prosthesis, images with more than 32 teeth and images with missing teeth. The workflow was based on a DOE (Design of experiments) study, considering the neural network architecture variables as factors, in order to identify the most significant ones, which ones mostly contribute to improve the fitness of the network, and the interactions between these in order to optimize the network architecture, based on the f-1 and recall scores. Obtained results are promising, considering that for the optimal network architecture, f-1 and Recall scores of 87% and 86%, respectively, were obtained.

15:15-16:15 Session 6C: Applications
Location: C
15:15
Lightless Fields: Enhancement and Denoising of Light-deficient Light Fields
PRESENTER: Carson Vogt

ABSTRACT. Modern focused light field cameras are capable of capturing video at over 160 frames per second, but in so doing sacrifice shutter speed. Outside of laboratory environments, lighting can be problematic resulting in noisy light fields and poor depth reconstruction. To enhance and denoise modern focused light field cameras, we create a unique deep neural network that allows for the full light field to be processed at once, eliminates stitching artifacts, and takes advantage of feature redundancy between neighboring microlenses. We show that our network, ENH-W, significantly outperforms other architectures in both visual and depth metrics.

15:35
FA3D: Fast and Accurate 3D Object Detection
PRESENTER: Selameab Demilew

ABSTRACT. Fast and accurate detection of objects, in 3D, is one of the critical components in an advanced driver assistance system. In this paper, we aim to develop an accurate 3D object detector that runs in near real-time on low-end embedded systems. We propose an efficient framework that converts raw point cloud into a 3D occupancy cuboid and detects cars using a deep convolutional neural network. Even though the complexity of our proposed model is high, it runs at 7.27 FPS on a Jetson Xavier and at 57.83 FPS on a high-end workstation that is 18% and 43% faster than the fastest published method while having a comparable performance with state-of-the-art models on the KITTI dataset. We conduct a comprehensive error analysis on our model and show that two quantities are the principal sources of error among nine predicted attributes. Our source code is available at https://github.com/Selameab/FA3D

15:55
Generalized Inverted Dirichlet Optimal predictor for Image inpainting
PRESENTER: Fatma Najar

ABSTRACT. Predicting a given pixel from surrounding neighbouring pixels is of great interest for several image processing tasks. Previous works focused on developing different Gaussian based models. Simultaneously, in real-world applications, the image texture and clutter are usually known to be non-Gaussian. In this paper, we develop a pixel prediction framework based on a finite generalized inverted Dirichlet (GID) mixture model that has proven its efficiency in several machine learning applications. We propose a GID optimal predictor, and we learn its parameters using a likelihood-based approach combined with the Newton-Raphson method. We demonstrate the efficiency of our proposed approach through a challenging application, namely image inpainting, and we compare the experimental results with related-work methods.

16:15-16:35Coffee Break
16:35-17:15 Session 7A: Video Analysis and Event Recognition
Location: A
16:35
An Approach Towards Action Recognition using Part Based Hierarchical Fusion
PRESENTER: Aditya Agarwal

ABSTRACT. The human body can be represented as an articulation of rigid and hinged joints which can be combined to form the parts of the body. Human actions can be thought of as a collective action of these parts. Hence, learning an effective spatio-temporal representation of the collective motion of these parts is key to action recognition. In this work, we propose an end-to-end pipeline for the task of human action recognition on video sequences using 2D joint trajectories estimated from a pose estimation framework. We use a Hierarchical Bidirectional Long Short Term Memory Network (HBLSTM) to model the spatio-temporal dependencies of the motion by fusing the pose based joint trajectories in a part based hierarchical fashion. To denote the effectiveness of our proposed approach, we compare its performance with six comparative architectures based on our model and also with several other methods on the widely used KTH and Weizmann action recognition datasets. Experimental results demonstrate that our proposed method outperforms the existing state of the art.

16:35-17:15 Session 7B: Special Track: Computational Bioimaging
Location: B
16:35
Ink Marker Segmentation in Histopathology Images Using Deep Learning
PRESENTER: Danial Maleki

ABSTRACT. Due to the recent advancements in machine vision, digital pathology has gained significant attention. Histopathology images are distinctly rich in visual information. The tissue glass slide images are utilized for disease diagnosis. Researchers study many methods to process histopathology images and facilitate fast and reliable diagnosis; therefore, the availability of high-quality slides becomes paramount. The quality of the images can be negatively affected when the glass slides are inkmarked by pathologists to delineate regions of interest. As an example, in one of the largest public histopathology datasets, The Cancer Genome Atlas (TCGA), approximately 12% of the digitized slides are affected by manual delineations through ink markings. To process these openaccess slide images and other repositories for the design and validation of new methods, an algorithm to detect the marked regions of the images is essential to avoid confusing tissue pixels with ink-colored pixels for computer methods. In this study, we propose to segment the inkmarked areas of pathology patches through a deep network. A dataset from 79 whole slide images with 4,305 patches was created and different networks were trained. Finally, the results showed an FPN model with the EffiecentNet-B3 as the backbone was found to be the superior configuration with an F1 score of 94.53%.

16:55
P-FideNet: Plasmodium Falciparum Identification Neural Network
PRESENTER: Rodrigo Veras

ABSTRACT. Malaria is a blood disease caused by the Plasmodium parasites transmitted through the bite of female Anopheles mosquito. The identification of the parasitized blood cells is a laborious and challenging task as it involves the very convoluted methods such as spotting the parasite in the blood and counting the number of the parasites. This examination can be arduous for large-scale diagnoses, resulting in poor quality. This paper presents a new Convolutional Neural Network (CNN) architecture named P-FideNet aimed at the detection of Malaria. The proposed CNN model can be used to solve problems in the classification of images of blood cells infected or not by parasite X. This tool makes the process of analysis by the specialist faster and more accurate. Comparative tests were carried out with state-of-the-art works, and P-FideNet achieved 98.88% accuracy and 99% precision.

16:35-17:15 Session 7C: Applications
Location: C
16:35
BVNet: A 3D End-to-end Model Based on Point Cloud
PRESENTER: Nuo Cheng

ABSTRACT. Point cloud LiDAR data are increasingly used for detecting road situations for autonomous driving. The most important issues here are the detection accuracy and the processing time. In this study, we propose a new model for improving the detecting performance based on point cloud. A well-known difficulty in processing 3D point cloud is that the point data are unordered. To address this problem, we define 3D point cloud features in the grid cells of the bird’s view according to the distribution of the points. In particular, we introduce the average and standard deviation of the heights as well as a distance-related density of the points as new features inside a cell. The resulting feature map is fed into a conventional neural network to obtain the outcomes, thus realizing an end-to-end real-time detection framework, called BVNet (Bird’s-View-Net). The proposed model is tested on the KITTI benchmark suit and the results show considerable improvement for the detection accuracy compared with the models without the newly introduced features.

16:55
Evaluating Single Image Dehazing Methods Under Realistic Sunlight Haze
PRESENTER: Zahra Anvari

ABSTRACT. Haze can degrade the visibility and the image quality drastically, thus degrading the performance of computer vision tasks such as object detection. Single image dehazing is a challenging and ill-posed problem, despite being widely studied. Most existing methods assume that haze has a uniform/homogeneous distribution and haze can have a single color,i.e.grayish white color similar to smoke, while in reality haze can be distributed non-uniformly with different patterns and colors. In this paper, we focus on haze created by sunlight as it is one of the most prevalent type of haze in the wild. Sunlight can generate non-uniformly distributed haze with drastic density changes due to sun rays and also a spectrum of haze color due to sunlight color changes during the day. This presents a new challenge to image dehazing methods. For these methods to be practical, this problem needs to be addressed. To quantify the challenges and assess the performance of these methods, we present a sunlight haze benchmark dataset, Sun-Haze, containing 107 hazy images with different types of haze created by sunlight having variety of intensity and color. We evaluate a representative set of state-of-the-art image dehazing methods on this benchmark dataset in terms of standard metrics such as PSNR, SSIM, CIEDE2000, PI and NIQE.This uncovers the limitation of the current methods, and questions their underlying assumptions as well as their practicality.