View: session overviewtalk overview
10:00 | Accuracy Improvement of Depth Estimation from Single Image by Using 3rd Player in GAN ABSTRACT. We propose an adversarial network for monocular depth estimation by synthesizing an image of the depth map from a single RGB input image. Differ from the regular scheme of generative adversarial network architecture, we extend the network with another player to refine the output from the generator. Notably, the generator model is known as the first player to learn to synthesize depth image while the second player (discriminator) classifies the generated depth image. The third player, at the same time, utilize to improve the reconstructed depth from the generator. In addition, to guide the generator to map the input image to the respective depth representation, we employ a conditional generative adversarial neural network (cGAN). Through extensive experiment validation, we confirmed the performance of our strategy on the publicly indoor NYU depth v2 datasets. We observed that our proposed method is able to improve the accuracy of the generated depth and shown to compare fairly over several related techniques. |
10:20 | Distortion Correction and Stitching of Overlapping Cattle Barn Images ABSTRACT. With the increase in the scale of dairy firms and the popularity of stall-free barns, the management of individual dairy cows is becoming more difficult. The obvious solution, installing wide-angle cameras on the ceiling, faces difficulties in grasping the positions of the cows in the barn as the number of cameras increases due to distortion and the overlapping of the captured images. In this study, we aim to create a panoramic image that satisfies the following requirements: the dairy cows must not be cropped, duplicated or missed, and the final image must be effectively seamless. We propose a method that extracts the individual regions of the dairy cows and add them to an underlying panoramic image. We conduct a user evaluation experiment and compare the proposed method with conventional methods such as multi-screen displays and a simple composition method. |
10:40 | Right Guarantee Method of Three-Dimensional Structure Created by Partial Polymerization ABSTRACT. In recent years, consumers have been able to create and release contents using the Internet, and the media called consumer-generated media (CGM) service has emerged. Because the digital contents distributed by this CGM service are obtained and viewed freely, it is used as a secondary application of the contents. Secondary use of these contents is 3D data for 3D printers, which is also digital contents. Copyright protection technology suitable for secondary use of the contents using a digital signature has been proposed. On the contrary, in 3D data, one 3D data may be constructed by three-dimensionally combining multiple data existing in each unit. In 3D data that may have a three-dimensional mesh-net structure, it is important to have a technology that guarantees which data is cited. In this paper, we propose a technology that guarantees the data citation process in 3D data. |
11:00 | Visualization of Physical Barriers in Pedestrian Space Using Photogrammetry-Based DEM PRESENTER: Koki Taniguchi ABSTRACT. Since the number of older adults has been increasing in Japan, it is urgently needed to create environments where vulnerable pedestrians and non-disabled people can live together. Barrier-free and universal design is being promoted around public facilities. However, away from these areas, many factors hinder the smooth passage of vulnerable pedestrians. This study proposes a method to visualize possible obstacles for the vulnerable pedestrians, based on the pedestrian space reconstruction by SfM with photo images taken on the sidewalk. This method successfully extracts 2D and 3D information of the barriers and integrates them into GIS to visualize through the internet. |
A Preliminary Study on the Effect of Moving Object Masking in 3D Model Reconstruction PRESENTER: Mitsuyasu Okamura ABSTRACT. 3D point cloud processing is one of the fields of computer vision, and has been applied to various fields in recent years. When reconstructing a 3D model from a video sequence, there is a problem that the accuracy of the reconstruction decreases if there is a moving object in the frame. In this work, we investigate a method to improve the accuracy of the reconstructed 3D model by extracting and masking the moving objects as a solution to this problem. The effectiveness of the proposed method is verified by experiments using several video sequences. |
A Preliminary Study on Shape Analysis of Remains Using 3D Features ABSTRACT. A 3D point cloud is a representation of a 3D space with many discrete coordinate information by laser or photogrammetry. Since 3D point clouds can be processed to directly analyze features in 3D space, they are expected to have various applications. In this work, we conduct a preliminary study of shape analysis of 3D point cloud data of remains using three kinds of indices (DoN, curvature, and ND-PCA). The results of an experiment using 3-D point cloud data obtained from actual remains are presented and discussed. |
The Prototype of Fish-Eye Lens Calibration Using Equiangular Markers PRESENTER: Koga Fukui ABSTRACT. Fish-eye cameras are often used to acquire full-dome images because of their wide viewing angle. Especially, the fish-eye lens of the equidistant projection model is suitable for developing the equirectangular cylindrical view image used in VR. Inexpensive mass-productive fish-eye lenses raise projection errors due to lens distortion. In this paper, we propose a prototype method for estimating and calibrating lens distortion using equirectangular markers. The images are taken in such a way that the markers are placed at equidistant positions between 0 and 90 degrees of the camera's viewing angle. If the lens is an ideal equidistant projection method, the markers will be projected equidistantly in the captured image. Therefore, the distortion can be estimated and corrected by measuring the error from equidistant in the actual captured image. |
Quantification of Age-Related Skin Quality Using Ano-Gan Deep Learning Model ABSTRACT. Quantification of age-related skin quality using deep learning for anomaly detection is discussed. Color images were captured using a digital camera, and its UV images were generated using our previous proposed method with a U-NET deep learning model. The anomaly detection deep learning model called Ano-GAN was trained on UV skin images of a young subject as normal cases. The UV skin images of a middle-aged subject were input to the well-trained Ano-GAN model, and the anomaly scores were computed. This abnormality score is useful for quantifying age-related skin changes. |
Segmentation of HE Staining Images of Mouse Pancreatic Using U-Net ABSTRACT. Carcinogenic mechanism of the pancreas has not been completely elucidated, and early detection of pancreatic cancer is extremely difficult. If high-resolution three-dimensional (3D) anatomical models can be constructed during pancreatic carcinogenesis, it may help elucidate the carcinogenesis mechanism. In recent years, 3D reconstruction from high-resolution microscopic images of pathological tissue have been studied, however few have focused on the pancreas. Since the microscopic image is a huge, it is necessary to automate the segmentation. This study aims to segment mouse pancreatic cell images using U-net as a pretreatment for the construction of the 3D anatomical model. We gave partially manually segmented images as teaching data,created 10-class models on U-net, and automatically segmented the entire mouse pathological image stained Hematoxylin and eosin. The results were segmented as a whole, although some small pancreatic ducts could not be extracted. |
Post-Capture Control in Hdr Refocused Image with the Theory of Compressive Epsilon Photography ABSTRACT. A traditional camera requires a photographer to select many parameters when they capture. This paper suggests a technique which achieves complete post-capture control of focus, aperture and exposure level in a traditional camera by acquiring a carefully selected set of 16 to 32 images which is less than 1 percent of the reconstructed image number. And this technique enables us to computationally reconstruct high dynamic range (HDR) images corresponding to all other focus and aperture settings. We show experimental results on several real data sets and openly provide the data sets. |
Evaluation of Self-Attention Approach in Hyperspectral Single-Pixel Classification ABSTRACT. Single-pixel classification aims at classifying a pixel in an image based solely on its pixel values without relying on the surrounding pixels. Since hyperspectral images (HSIs) have many spectral bands for each pixel, HSIs can take advantage of this more than RGB images in single-pixel classification. Most previous methods for single-pixel classification of HSIs have been proposed with Convolutional Neural Networks (CNNs) or Multi-Layer Perceptron (MLP) in recent years. In this paper, we experiment with applying an attention-based method directly to hyperspectral single-pixel classification. Given a single pixel of HSIs, the attention layers can capture the long-range spectrum relationships between spectral bands and explain what dependencies the model prediction highly rely on. The experimental results indicate that the implemented attention-based approach is comparable to the state-of-the-art method in classification accuracy. |
Viewpoint Dependency of Attractiveness of Smiling Faces Generated by Impression Transformation of Morphable 3d Face Models ABSTRACT. We investigated viewpoint dependency of the relationship between intensity of smile and attractiveness of faces. Smiling faces of several intensity levels were obtained by step-by-step impression transformation of morphable 3D face model, and the observational perspective was virtually changed by rotating the model. Attractiveness of the faces was evaluated using Thurston’s pairwise comparison method. The results show that observers find smiling faces most attractive when they look the face straight in the eye. When the smiling intensity significantly increased, however, the attractiveness decreased, and the loss of the attractiveness was greater with female faces than with male faces. |
Automatic Extraction of Speech Segments from Motion Pictures by Time-Series Clustering of Visual Feature Points ABSTRACT. We intended to extract speech segments for each vowel by applying a time-series clustering method on facial video images in which different vowels were continuously uttered. We experimentally confirmed that the utterances were accurately identified in video images. Each vowel was newly uttered when the video images in each segment were used as training samples. In our experiments, we tested two time-series clustering methods, the k-Shape and the TICC methods. It has been confirmed that the TICC provides better performance compared to the k-Shape. |
Semantics-Aware Color Palette Generation for Graphic Designs ABSTRACT. In this paper, we present a method to generate a color palette considering the semantics of the input image to colorize the template of graphic designs. In order to make a dataset which includes text and palette pairs, we first collect multiple graphic designs from the texts using Google Image Search. Then, from the collected images, we extract a color palette detected colorizing graphic designs. We compare and discuss the color palettes generated by several methods in terms of palette quality and generation time. Our method generates colors associated with the input text named for each layer of the input image. Therefore, we can assist novice designers to obtain and colorize diverse designs efficiently. |
The Relationship Between the Variation in Line Drawing and the Reaction Delay Rms in a Simulated Driving Task ABSTRACT. In this study, we investigate the relationship between the variation (consistency) of line drawing based on point clouds and the reaction delay RMS (root mean square) in a simulated driving task. In order to examine this relationship, 23 participants performed a line drawing task 3 times and a simulated driving task, and the correlation was computed. The results suggest that inconsistency of line drawing is positively correlated with the reaction delay RMS where the correlation coefficient was about 0.7. The results of multiple regression analysis show that the consistency of the line drawing can predict the reaction delay RMS with the coefficient of determination 0.50. The results suggest that participants who tend to change the criteria for drawing lines in a certain point clouds are also likely to show the large reaction delay RMS in a simulated driving task. |
Attribute Preserved Face de-Identification by Using Conditional Generative Adversarial Network ABSTRACT. The technology to prevent the identification of a person from face images is necessary for privacy protection in many fields such as social networking and medical records. In this paper, we propose a method to transform a face image into a face image of another person while preserving the attributes of the face image in order to anonymize the face image while preserving as much information as possible other than privacy. In our experiments, we combined attribute prediction and cGAN (conditional Generative Adversarial Network) to generate a new face image that preserves the attributes of the input face image, and quantitatively evaluated the consistency of the attributes between the input image and the generated image. |
Arbitrary Viewpoint Omnidirectional Image Generation Based on Spherical Light Field Using SLAM ABSTRACT. Today, with the development of VR technology and the widespread use of omnidirectional cameras, it has become possible to generate an omnidirectional virtual space. However, it is not possible to move the viewpoint position when the virtual space is generated based on sparse images. To solve this problem, image-based rendering techniques for generating arbitrary-viewpoint omnidirectional images can be a solution. In this study, we use SLAM for omnidirectional video to estimate camera positions and generate omnidirectional images using spherical light field. The camera calibration procedure is automatically processed. |
Reproduction of Takigi Noh Based on Anisotropic Reflection Rendering of Noh Costume with Dynamic Illumination ABSTRACT. Takigi Noh is performed with torches around the Noh stage from sunset to night in summer. Four torch flames around the Noh stage make the performer's Noh costume shine beautifully. In this study, we reproduce Takigi Noh in virtual reality space. Firstly, we measured a Noh costume fabric by omnidirectional anisotropic reflectance measurement system called optical gyro measuring machine (OGM), and generate bidirectional texture function (BTF) of the Noh costume based on multi-directional illumination high dynamic range (HDR) image analysis. Secondly, we rendered performer's Noh costume with image-based lighting (IBL) using environment maps based on lighting texture of background around the Noh stage and dynamic lighting texture of flames. In addition, we reproduced Takigi Noh using measured data such as Noh stage, Noh mask and motion of performers. |
15:30 | Image Data Augmentation Based on Cramér Generative Adversarial Networks on Retinal Images for Hard Exudates Detection ABSTRACT. Hard exudates can be seen in any conditions that are associated with chronic vascular leakage, and they are caused decreased vision if the macula is involved. Thus, we have been developing a hard exudate detection using the patch-based Convolutional Neural Network (CNN). To improve an imbalanced dataset with hard exudate and normal tissue images, we present a data augmentation of the lesion images by Cramer Generative Adversarial Networks, which has great diversity and stable learning. By applying the balanced dataset, the accuracy of CNN is approximately 3% higher than that of the imbalanced dataset. |
15:50 | Application of a Deep Neural Network to Determine the Rate of Glomerular Sclerosis ABSTRACT. Whole Slide Images allow automatic procedures in histopathology to quantify the total number of glomeruli and the rate of glomerular sclerosis as an indicator of damage with the objective of sorting slides. In this work, the usage of Deep Learning is proposed in the detection and classification of glomeruli. For training and validation, this work used 30 complete slides including 585 sclerosed and 3383 functional annotated glomeruli. This work obtained a recall of 96.8%, precision of 95.9%, accuracy of 98.1% and an F1 score of 96.3%. A system was proposed and validated to identify the percentage of sclerosed glomeruli, allowing support for study of nephropathies. |
16:10 | Registration of Histopathological Heterogeneous Stained Images Utilizing Gan Based Domain Adaptation Technique ABSTRACT. Registration of histopathological images obtained from different staining techniques is very challenging because of much difference of their color information. In this study, we propose a promising image registration method that can overcome the color difference of H&E and EVG stained images by means of GAN-based color conversion. Our proposed method consists of two main parts: one is GAN based unsupervised domain adaptation network for converting H&E stained image to EVG stained image which has similar distribution with the original EVG stained image and the other is SURF feature based registration framework which provides the registered EVG stained image leveraging the generated EVG stained image obtained from the domain adaptation network. The experimental result shows that our proposed method is able to provide better registration result than the conventional method where domain adaptation technique is not incorporated. |
16:30 | Preliminary Study on Extraction of Cervical Intervertebral Disks from Videofluorography During Swallowing by Use of Multi-Channelization and M-Net PRESENTER: Erika Gunji ABSTRACT. Dysphagia may make it difficult for patients to swallow food and drink, and lower their quality of life. The mechanism of dysphagia has not been elucidated yet. It is necessary to analyze the morphology and dynamics of cervical structures such as epiglottises and cervical intervertebral disks (CIDs). This study proposes a segmentation method of CIDs from videofluorography (VF) during swallowing by use of M-Net, our multi-channelization (MC) technique, and image feature selection. The MC technique converts the frame images of VF to feature images by using non-linear filters as well as linear filters, and inputs these feature images to M-Net. The combination of the feature images is optimized by the simulated annealing technique. The proposed method was applied to the actual 58 VF, and segmentation accuracy was evaluated by pixel-wise F-measure. The F-measure of the conventional M-Net was 0.730, whereas that of our segmentation method was 0.734. |
16:50 | Preliminary Study on Extraction of Epiglottises from Videofluorography with U-Net PRESENTER: Ayami Sugita ABSTRACT. In this paper, we proposed an extraction method of epiglottises from videofluorography by use of U-Net. Three frame images are selected at one second interval from each videofluorography. In the second frame images, hyoid bones are at their highest positions. Epiglottises are manually extracted from the frame images as ground truth under the supervise of a medical doctor. The U-Net is trained with these data after applying the Affine-based data augmentation, and then extracts the candidate regions of epiglottises from test data. The proposed method is applied to actual 19 videofluorography and the extraction accuracy is evaluated by pixel-wise F measures. Several experimental results are shown. |
17:10 | A Feature Value for Measuring Progression of Gastric Atrophy Utilizing the Distribution of Folds in Gastric X-Ray Images ABSTRACT. This paper presents a feature value for measuring progression of gastric atrophy from gastric X-ray images. In the proposal, after the target area for the diagnosis is determined and the gastric folds are extracted in the images, the feature is extracted from the area based on the diagnostic index for reading the atrophy from the images. Concretely, the feature measures quantity of the folds in the stomach region near the lesser curvature. Experiments for examining the performance of the feature were conducted to 68 images and the results showed that the features are effective well to measure the progression. |