IEVC2021: THE 7TH IIEEJ INTERNATIONAL CONFERENCE ON IMAGE ELECTRONICS AND VISUAL COMPUTING
PROGRAM FOR FRIDAY, SEPTEMBER 10TH
Days:
previous day
next day
all days

View: session overviewtalk overview

10:00-11:20 Session 5A: Computer Vision & 3D Image Processing
Location: Room A
10:00
Accuracy Improvement of Depth Estimation from Single Image by Using 3rd Player in GAN

ABSTRACT. We propose an adversarial network for monocular depth estimation by synthesizing an image of the depth map from a single RGB input image. Differ from the regular scheme of generative adversarial network architecture, we extend the network with another player to refine the output from the generator. Notably, the generator model is known as the first player to learn to synthesize depth image while the second player (discriminator) classifies the generated depth image. The third player, at the same time, utilize to improve the reconstructed depth from the generator. In addition, to guide the generator to map the input image to the respective depth representation, we employ a conditional generative adversarial neural network (cGAN). Through extensive experiment validation, we confirmed the performance of our strategy on the publicly indoor NYU depth v2 datasets. We observed that our proposed method is able to improve the accuracy of the generated depth and shown to compare fairly over several related techniques.

10:20
Distortion Correction and Stitching of Overlapping Cattle Barn Images

ABSTRACT. With the increase in the scale of dairy firms and the popularity of stall-free barns, the management of individual dairy cows is becoming more difficult. The obvious solution, installing wide-angle cameras on the ceiling, faces difficulties in grasping the positions of the cows in the barn as the number of cameras increases due to distortion and the overlapping of the captured images. In this study, we aim to create a panoramic image that satisfies the following requirements: the dairy cows must not be cropped, duplicated or missed, and the final image must be effectively seamless. We propose a method that extracts the individual regions of the dairy cows and add them to an underlying panoramic image. We conduct a user evaluation experiment and compare the proposed method with conventional methods such as multi-screen displays and a simple composition method.

10:40
Right Guarantee Method of Three-Dimensional Structure Created by Partial Polymerization

ABSTRACT. In recent years, consumers have been able to create and release contents using the Internet, and the media called consumer-generated media (CGM) service has emerged. Because the digital contents distributed by this CGM service are obtained and viewed freely, it is used as a secondary application of the contents. Secondary use of these contents is 3D data for 3D printers, which is also digital contents. Copyright protection technology suitable for secondary use of the contents using a digital signature has been proposed. On the contrary, in 3D data, one 3D data may be constructed by three-dimensionally combining multiple data existing in each unit. In 3D data that may have a three-dimensional mesh-net structure, it is important to have a technology that guarantees which data is cited. In this paper, we propose a technology that guarantees the data citation process in 3D data.

11:00
Visualization of Physical Barriers in Pedestrian Space Using Photogrammetry-Based DEM
PRESENTER: Koki Taniguchi

ABSTRACT. Since the number of older adults has been increasing in Japan, it is urgently needed to create environments where vulnerable pedestrians and non-disabled people can live together. Barrier-free and universal design is being promoted around public facilities. However, away from these areas, many factors hinder the smooth passage of vulnerable pedestrians. This study proposes a method to visualize possible obstacles for the vulnerable pedestrians, based on the pedestrian space reconstruction by SfM with photo images taken on the sidewalk. This method successfully extracts 2D and 3D information of the barriers and integrates them into GIS to visualize through the internet.

10:00-11:20 Session 5B: Virtual, Augmented, and Mixed Reality
Location: Room B
10:00
A Study of Information Presentation Method for Cooking Using HoloLens2

ABSTRACT. In recent years, the field of xR has begun to penetrate the world, and it is being used in various situations, and research is becoming more and more active. Cooking is one of day-to-day activities, and we thought that our dailylives would be more convenient if we had visual information to assist us in cooking thanks to displaying virtual objects. In this paper, we examine and investigate how users would like to be supported during cooking by MR visual information presentation via HoloLens2. In order to provide visual support during cooking, we developed an application that runs onHoloLens2 using Unity to display videos of cooking recipes and to help the users cook smoothly without taking their attention away from the cooking in hand. We prepare three patterns of cooking recipe presentation method to let the users see: "fixed," "free arrangement," and "tracking," and conducted a survey in which subjects were asked to evaluate and rank each pattern. The "free placement" presentation pattern, which allowed the user to freely change recipe's position and size, is found to be the best, with a statistical test indicating that the "intuitive" evaluation item is a factor. However, there were no significant differences in the evaluation items of the other patterns at this experiment, so we would like to increase the number of samples for verification in the future. In addition, since HoloLens2 is not yet a commonly used device in the world, there is a bias in comparing its operability with that of devices such as smartphones and tablet devices that are in daily use. If MR technology develops further in the future and HMDs equipped with xR are used on a daily basis, the results are expected to be different from this comparison.

10:20
A Research on the Playability of TRPG Using VR Technology

ABSTRACT. In recent years, contents using 3D technologies have become common in TV, movies, and games. Furthermore, with the attention of HMDs such as OculusQuest and PlayStationVR, various genres of games and VR technology are combined. On the other hand, there are few examples of VR technology adapted to TRPGs, and it is not a common genre of VR games. However, it is expected that the use of VR technologies in TRPGs will make traditional TRPGs even more enjoyable. Therefore, in this study, through evaluation experiments of a Cthulhu Mythos TRPG with VR technology, a traditional face-to-face gathering TRPG, and an online TRPG, we will examine which is the most game-like and discuss the effect of VR on the fun.

10:40
Landing Impact Sound Changed the Sense of Weight of Objects in Virtual Reality

ABSTRACT. In virtual reality, it is necessary to clarify what kind of auditory stimulus is effective in order to present a pseudo-haptics sensation by the auditory stimulus due to the cross-modal phenomenon. We changed the pitch of the metallic sound that occurs when we drop the dumbbell. The impression that the research participants received from the falling sound and the weight when the dumbbell was used were recorded. The results of this study suggest that the impression received from images including sound stimuli is "noisy," "strong," "dynamic," "clear," and "sharp," which makes them feel heavier.

11:00
Improved Projection Mapping for Reproducing Proper Occlusion and Grasping Positional Relationship Between Mid-Air 3DCG Images and a Hand

ABSTRACT. We have developed a method to observe a stereoscopic 3DCG image in mid-air using micro mirror array plate (MMAP) and interact with the image by hand directly. Our method resolve the conflict of occlusion between mid-air 3DCG images and the user's hand during the interaction by projecting an image on the user's hand to reproduce correct occlusion. It is important to grasp the positional relationship between the object and the hand in direct interaction. We improve our former method for grasping more clear positional relationship between a mid-air image and a hand during direct interaction. Our method changes the color and density of the projected image on the hand for reproducing the occlusion according to the positional relationship between the mid-air image and the hand. A shadow of the mid-air image is also projected on the hand. Our method could be also useful for expressing the transparency of mid-air 3DCG images.

14:15-15:15 Session LBP: Late Breaking Papers
A Preliminary Study on the Effect of Moving Object Masking in 3D Model Reconstruction

ABSTRACT. 3D point cloud processing is one of the fields of computer vision, and has been applied to various fields in recent years. When reconstructing a 3D model from a video sequence, there is a problem that the accuracy of the reconstruction decreases if there is a moving object in the frame. In this work, we investigate a method to improve the accuracy of the reconstructed 3D model by extracting and masking the moving objects as a solution to this problem. The effectiveness of the proposed method is verified by experiments using several video sequences.

A Preliminary Study on Shape Analysis of Remains Using 3D Features

ABSTRACT. A 3D point cloud is a representation of a 3D space with many discrete coordinate information by laser or photogrammetry. Since 3D point clouds can be processed to directly analyze features in 3D space, they are expected to have various applications. In this work, we conduct a preliminary study of shape analysis of 3D point cloud data of remains using three kinds of indices (DoN, curvature, and ND-PCA). The results of an experiment using 3-D point cloud data obtained from actual remains are presented and discussed.

The Prototype of Fish-Eye Lens Calibration Using Equiangular Markers
PRESENTER: Koga Fukui

ABSTRACT. Fish-eye cameras are often used to acquire full-dome images because of their wide viewing angle. Especially, the fish-eye lens of the equidistant projection model is suitable for developing the equirectangular cylindrical view image used in VR. Inexpensive mass-productive fish-eye lenses raise projection errors due to lens distortion. In this paper, we propose a prototype method for estimating and calibrating lens distortion using equirectangular markers. The images are taken in such a way that the markers are placed at equidistant positions between 0 and 90 degrees of the camera's viewing angle. If the lens is an ideal equidistant projection method, the markers will be projected equidistantly in the captured image. Therefore, the distortion can be estimated and corrected by measuring the error from equidistant in the actual captured image.

Quantification of Age-Related Skin Quality Using Ano-Gan Deep Learning Model

ABSTRACT. Quantification of age-related skin quality using deep learning for anomaly detection is discussed. Color images were captured using a digital camera, and its UV images were generated using our previous proposed method with a U-NET deep learning model. The anomaly detection deep learning model called Ano-GAN was trained on UV skin images of a young subject as normal cases. The UV skin images of a middle-aged subject were input to the well-trained Ano-GAN model, and the anomaly scores were computed. This abnormality score is useful for quantifying age-related skin changes.

Segmentation of HE Staining Images of Mouse Pancreatic Using U-Net

ABSTRACT. Carcinogenic mechanism of the pancreas has not been completely elucidated, and early detection of pancreatic cancer is extremely difficult. If high-resolution three-dimensional (3D) anatomical models can be constructed during pancreatic carcinogenesis, it may help elucidate the carcinogenesis mechanism. In recent years, 3D reconstruction from high-resolution microscopic images of pathological tissue have been studied, however few have focused on the pancreas. Since the microscopic image is a huge, it is necessary to automate the segmentation. This study aims to segment mouse pancreatic cell images using U-net as a pretreatment for the construction of the 3D anatomical model. We gave partially manually segmented images as teaching data,created 10-class models on U-net, and automatically segmented the entire mouse pathological image stained Hematoxylin and eosin. The results were segmented as a whole, although some small pancreatic ducts could not be extracted.

Post-Capture Control in Hdr Refocused Image with the Theory of Compressive Epsilon Photography

ABSTRACT. A traditional camera requires a photographer to select many parameters when they capture. This paper suggests a technique which achieves complete post-capture control of focus, aperture and exposure level in a traditional camera by acquiring a carefully selected set of 16 to 32 images which is less than 1 percent of the reconstructed image number. And this technique enables us to computationally reconstruct high dynamic range (HDR) images corresponding to all other focus and aperture settings. We show experimental results on several real data sets and openly provide the data sets.

Evaluation of Self-Attention Approach in Hyperspectral Single-Pixel Classification

ABSTRACT. Single-pixel classification aims at classifying a pixel in an image based solely on its pixel values without relying on the surrounding pixels. Since hyperspectral images (HSIs) have many spectral bands for each pixel, HSIs can take advantage of this more than RGB images in single-pixel classification. Most previous methods for single-pixel classification of HSIs have been proposed with Convolutional Neural Networks (CNNs) or Multi-Layer Perceptron (MLP) in recent years. In this paper, we experiment with applying an attention-based method directly to hyperspectral single-pixel classification. Given a single pixel of HSIs, the attention layers can capture the long-range spectrum relationships between spectral bands and explain what dependencies the model prediction highly rely on. The experimental results indicate that the implemented attention-based approach is comparable to the state-of-the-art method in classification accuracy.

Viewpoint Dependency of Attractiveness of Smiling Faces Generated by Impression Transformation of Morphable 3d Face Models

ABSTRACT. We investigated viewpoint dependency of the relationship between intensity of smile and attractiveness of faces. Smiling faces of several intensity levels were obtained by step-by-step impression transformation of morphable 3D face model, and the observational perspective was virtually changed by rotating the model. Attractiveness of the faces was evaluated using Thurston’s pairwise comparison method. The results show that observers find smiling faces most attractive when they look the face straight in the eye. When the smiling intensity significantly increased, however, the attractiveness decreased, and the loss of the attractiveness was greater with female faces than with male faces.

Automatic Extraction of Speech Segments from Motion Pictures by Time-Series Clustering of Visual Feature Points

ABSTRACT. We intended to extract speech segments for each vowel by applying a time-series clustering method on facial video images in which different vowels were continuously uttered. We experimentally confirmed that the utterances were accurately identified in video images. Each vowel was newly uttered when the video images in each segment were used as training samples. In our experiments, we tested two time-series clustering methods, the k-Shape and the TICC methods. It has been confirmed that the TICC provides better performance compared to the k-Shape.

Semantics-Aware Color Palette Generation for Graphic Designs

ABSTRACT. In this paper, we present a method to generate a color palette considering the semantics of the input image to colorize the template of graphic designs. In order to make a dataset which includes text and palette pairs, we first collect multiple graphic designs from the texts using Google Image Search. Then, from the collected images, we extract a color palette detected colorizing graphic designs. We compare and discuss the color palettes generated by several methods in terms of palette quality and generation time. Our method generates colors associated with the input text named for each layer of the input image. Therefore, we can assist novice designers to obtain and colorize diverse designs efficiently.

The Relationship Between the Variation in Line Drawing and the Reaction Delay Rms in a Simulated Driving Task

ABSTRACT. In this study, we investigate the relationship between the variation (consistency) of line drawing based on point clouds and the reaction delay RMS (root mean square) in a simulated driving task. In order to examine this relationship, 23 participants performed a line drawing task 3 times and a simulated driving task, and the correlation was computed. The results suggest that inconsistency of line drawing is positively correlated with the reaction delay RMS where the correlation coefficient was about 0.7. The results of multiple regression analysis show that the consistency of the line drawing can predict the reaction delay RMS with the coefficient of determination 0.50. The results suggest that participants who tend to change the criteria for drawing lines in a certain point clouds are also likely to show the large reaction delay RMS in a simulated driving task.

Attribute Preserved Face de-Identification by Using Conditional Generative Adversarial Network

ABSTRACT. The technology to prevent the identification of a person from face images is necessary for privacy protection in many fields such as social networking and medical records. In this paper, we propose a method to transform a face image into a face image of another person while preserving the attributes of the face image in order to anonymize the face image while preserving as much information as possible other than privacy. In our experiments, we combined attribute prediction and cGAN (conditional Generative Adversarial Network) to generate a new face image that preserves the attributes of the input face image, and quantitatively evaluated the consistency of the attributes between the input image and the generated image.

Arbitrary Viewpoint Omnidirectional Image Generation Based on Spherical Light Field Using SLAM

ABSTRACT. Today, with the development of VR technology and the widespread use of omnidirectional cameras, it has become possible to generate an omnidirectional virtual space. However, it is not possible to move the viewpoint position when the virtual space is generated based on sparse images. To solve this problem, image-based rendering techniques for generating arbitrary-viewpoint omnidirectional images can be a solution. In this study, we use SLAM for omnidirectional video to estimate camera positions and generate omnidirectional images using spherical light field. The camera calibration procedure is automatically processed.

Reproduction of Takigi Noh Based on Anisotropic Reflection Rendering of Noh Costume with Dynamic Illumination

ABSTRACT. Takigi Noh is performed with torches around the Noh stage from sunset to night in summer. Four torch flames around the Noh stage make the performer's Noh costume shine beautifully. In this study, we reproduce Takigi Noh in virtual reality space. Firstly, we measured a Noh costume fabric by omnidirectional anisotropic reflectance measurement system called optical gyro measuring machine (OGM), and generate bidirectional texture function (BTF) of the Noh costume based on multi-directional illumination high dynamic range (HDR) image analysis. Secondly, we rendered performer's Noh costume with image-based lighting (IBL) using environment maps based on lighting texture of background around the Noh stage and dynamic lighting texture of flames. In addition, we reproduced Takigi Noh using measured data such as Noh stage, Noh mask and motion of performers.

15:30-17:30 Session 6A: Medical Imaging
Location: Room A
15:30
Image Data Augmentation Based on Cramér Generative Adversarial Networks on Retinal Images for Hard Exudates Detection

ABSTRACT. Hard exudates can be seen in any conditions that are associated with chronic vascular leakage, and they are caused decreased vision if the macula is involved. Thus, we have been developing a hard exudate detection using the patch-based Convolutional Neural Network (CNN). To improve an imbalanced dataset with hard exudate and normal tissue images, we present a data augmentation of the lesion images by Cramer Generative Adversarial Networks, which has great diversity and stable learning. By applying the balanced dataset, the accuracy of CNN is approximately 3% higher than that of the imbalanced dataset.

15:50
Application of a Deep Neural Network to Determine the Rate of Glomerular Sclerosis

ABSTRACT. Whole Slide Images allow automatic procedures in histopathology to quantify the total number of glomeruli and the rate of glomerular sclerosis as an indicator of damage with the objective of sorting slides. In this work, the usage of Deep Learning is proposed in the detection and classification of glomeruli. For training and validation, this work used 30 complete slides including 585 sclerosed and 3383 functional annotated glomeruli. This work obtained a recall of 96.8%, precision of 95.9%, accuracy of 98.1% and an F1 score of 96.3%. A system was proposed and validated to identify the percentage of sclerosed glomeruli, allowing support for study of nephropathies.

16:10
Registration of Histopathological Heterogeneous Stained Images Utilizing Gan Based Domain Adaptation Technique

ABSTRACT. Registration of histopathological images obtained from different staining techniques is very challenging because of much difference of their color information. In this study, we propose a promising image registration method that can overcome the color difference of H&E and EVG stained images by means of GAN-based color conversion. Our proposed method consists of two main parts: one is GAN based unsupervised domain adaptation network for converting H&E stained image to EVG stained image which has similar distribution with the original EVG stained image and the other is SURF feature based registration framework which provides the registered EVG stained image leveraging the generated EVG stained image obtained from the domain adaptation network. The experimental result shows that our proposed method is able to provide better registration result than the conventional method where domain adaptation technique is not incorporated.

16:30
Preliminary Study on Extraction of Cervical Intervertebral Disks from Videofluorography During Swallowing by Use of Multi-Channelization and M-Net
PRESENTER: Erika Gunji

ABSTRACT. Dysphagia may make it difficult for patients to swallow food and drink, and lower their quality of life. The mechanism of dysphagia has not been elucidated yet. It is necessary to analyze the morphology and dynamics of cervical structures such as epiglottises and cervical intervertebral disks (CIDs). This study proposes a segmentation method of CIDs from videofluorography (VF) during swallowing by use of M-Net, our multi-channelization (MC) technique, and image feature selection. The MC technique converts the frame images of VF to feature images by using non-linear filters as well as linear filters, and inputs these feature images to M-Net. The combination of the feature images is optimized by the simulated annealing technique. The proposed method was applied to the actual 58 VF, and segmentation accuracy was evaluated by pixel-wise F-measure. The F-measure of the conventional M-Net was 0.730, whereas that of our segmentation method was 0.734.

16:50
Preliminary Study on Extraction of Epiglottises from Videofluorography with U-Net
PRESENTER: Ayami Sugita

ABSTRACT. In this paper, we proposed an extraction method of epiglottises from videofluorography by use of U-Net. Three frame images are selected at one second interval from each videofluorography. In the second frame images, hyoid bones are at their highest positions. Epiglottises are manually extracted from the frame images as ground truth under the supervise of a medical doctor. The U-Net is trained with these data after applying the Affine-based data augmentation, and then extracts the candidate regions of epiglottises from test data. The proposed method is applied to actual 19 videofluorography and the extraction accuracy is evaluated by pixel-wise F measures. Several experimental results are shown.

17:10
A Feature Value for Measuring Progression of Gastric Atrophy Utilizing the Distribution of Folds in Gastric X-Ray Images

ABSTRACT. This paper presents a feature value for measuring progression of gastric atrophy from gastric X-ray images. In the proposal, after the target area for the diagnosis is determined and the gastric folds are extracted in the images, the feature is extracted from the area based on the diagnostic index for reading the atrophy from the images. Concretely, the feature measures quantity of the folds in the stomach region near the lesser curvature. Experiments for examining the performance of the feature were conducted to 68 images and the results showed that the features are effective well to measure the progression.

15:30-17:30 Session 6B: (Special session) Safe and Secure Society
Location: Room B
15:30
Invertible Fingerprint Replacement for Image Privacy Protection

ABSTRACT. The demand for privacy protection has been increasing with the widespread use of devices that can easily take high-resolution images such as digital cameras and smartphones. In particular, fingerprint information is one of the targets for privacy protection. In this paper, we propose a method for reversibly replacing fingerprints in an image with another fake fingerprint. This method makes it possible to automatically remove the original fingerprint information in the input image and generate a natural image. Moreover, the input image’s fingerprint information can be easily restored from the output image only by specific persons who know the key used in the image generation process.

15:50
Analysis of Japanese Tweets with Hashtags Related to COVID-19

ABSTRACT. The novel coronavirus pandemic (COVID-19) is having serious global impacts. This has led many people to post COIVD-19-related messages (tweets) on Twitter. When posting these tweets, they use hashtags to indicate that their tweets are related to COVID-19. Such hashtags include #COVID19 and #covid19. In Japanese tweets, they often use #NovelCorona or #NovelCoronaVirus. In this paper, we collect Japanese tweets that contained these hashtags and were posted from March to August 2020. We visualize these tweets by word co-occurrence network diagrams and discuss the contents of them and the trends of people's interests.

16:10
Visualization of Tweets' Contents Related to the 2020 Kyushu Flood Disaster in Japan

ABSTRACT. To minimize damage during disasters, the rapid collection and delivery of accurate information are essential. From this perspective, the utilization of Twitter in the event of a disaster has attracted worldwide attention. In this study, we consider visualizing the contents of the tweets posted during the Kyushu flood disaster in July 2020. We also develop a system to visualize the tweets’ contents and the precipitation on the day simultaneously.

16:30
Glocal Disaster Monitoring System Using Tweet Data and Satellite Images

ABSTRACT. Nowadays, the Internet and smartphones are widespread and it is easy to transmit and share information. In addition, the accuracy of meteorological forecasts has improved significantly. However, there are still many cases in which many victims are caused by delays in escape during a disaster. Therefore, we have been attempting to construct a “glocal disaster monitoring system” by combining social media information that can be used for local disaster damage monitoring and satellite data suitable for global disaster damage monitoring. In this study, we constructed a prototype of a glocal disaster monitoring system that simultaneously displays disaster-related tweets and inundation area estimation results obtained from satellite data on a map.

16:50
A Study on Evaluating Recovery from Forest Fire Damage Using Satellite Data

ABSTRACT. The number of forest and steppe fires in Mongolia has been increasing in recent years, including in spring, autumn, and sometimes dry summers. The purpose of this study is investigate the possibility of evaluating the forest fire damages and recovery in Mongolia using data acquired by the optical sensor OLI onboard Landsat-8 satellite. The authors have used two indices, Normalized Burn Ratio (NBR), and Normalized Difference Vegetation Index (NDVI) derived from OLI data. NBR was quite useful for detecting burnt areas. The vegetation recovery of the burnt areas could be identified both by NBR and NDVI.